The Savvy Aviator #22: The Art Of Troubleshooting
Fixing a problem is the usually the easy part of aircraft maintenance. The hard part is figuring out what's wrong. A little troubleshooting can often save you a small fortune in unneeded parts and labor.
I pulled my Cessna T310R out of the hangar, and performed the usual pre-flight walk-around. I climbed into the left seat, watched my two passengers take their seats and fasten their seat belts, and checked that the cabin door was properly secured. Pre-start checklist: flight controls free and correct, switches and dimmers off, circuit breakers checked, cowl flaps open, alternate air closed ...
It was the Friday before New Year's Day, and we were headed up to the Bay Area for the day. The weather was gorgeous CAVU and a glance at the panel clock showed we were right on schedule for an on-time departure. All seemed right with the world.
... battery and alternator switches on, voltammeter switch to VOLTS position, check bus voltage ... huh?
"Aw, shoot," I said (or words to that effect). Awwww ... SHOOT!!!"
"What's wrong?" asked my right-seat passenger, who happened also to be a fellow pilot, aircraft owner and A&P. His wife in the back seat -- a non-pilot -- said nothing but put down her book and pricked up her ears in obvious concern.
|C310 Voltammeter (click for larger version)|
"Look at the voltmeter," I said, pointing to the small rectangular instrument located just below the avionics stack. "We should have 24 volts ... 22 volts is minimum for start ... the meter shows only 16 volts ... we'll never be able to crank the engines with battery voltage that low."
I realized it would take at least several hours on the battery charger before we could fly. How could this happen? I'd flown the airplane only last week. I hadn't left the battery switch on, nor any of the hot-wired courtesy lights. I could think of no reason that the battery voltage should be less than near-full charge. I wiggled the voltammeter switch, but the voltage reading remained at a pitifully inadequate 16 volts. I saw our carefully crafted timetable for the day unraveling before my eyes. Arrrrgh!
According to the Cessna 310 POH, a minimum of 22 volts is required for engine start. It looked like we weren't even close to having that. Over the years, however, I've developed the habit of being skeptical of any single instrument indication unless I can confirm it through independent means. So I pressed the left-engine starter button, expecting to hear at most an anemic starter solenoid click and nothing more.
To my astonishment, the left engine cranked vigorously and started after just a few blades of rotation! The battery voltage was obviously fine. The voltammeter had been lying.
I started the left engine without difficulty, and then the right engine. The "LOW VOLT" annunciator lamp extinguished, indicating that bus voltage was now greater than 25.5 volts.
The voltammeter now read 18 volts. Clearly, the voltammeter was lying!
My mind momentarily flashed back nine months to a technical support phone call I received from the owner a Cessna 340A who was obviously at his wits end. He explained that the voltammeter in his airplane (which was exactly like mine) was not working properly when the meter switch was in the "R ALT" position.
"My mechanic replaced the selector switch, the meter, the right alternator shunt and the meter fuses," the owner explained, "but the meter still doesn't work." He explained that he'd already spent a small fortune trying to resolve this problem, and his A&P had run out of ideas.
It always disturbs me to receive a call like this because the owner could have been spared a great deal of expense and anguish if only he'd called earlier -- before his shop began to "shotgun" the problem by replacing one expensive component after another.
I spent about 15 minutes outlining a logical fault-isolation procedure to the owner, and asked him to check several things and call me back. He did, and we quickly determined that, in fact, there wasn't anything wrong with his voltammeter. The meter was reading zero in the "R ALT" position because in fact the right alternator was not putting out any current!
This owner had told his mechanic that the voltammeter wasn't working. His mechanic took the owner's report literally, failed to verify the owner's diagnosis, and instead charged the owner for hundreds of dollars worth of parts and labor trying to fix a voltammeter problem that didn't exist, rather than focusing on the alternator problem that did exist. What a pity.
Back To The Present
I was satisfied, however, that the situation on my airplane was the opposite. The battery voltage was clearly fine, otherwise it wouldn't have cranked the engines so briskly. With the engines running, the alternators were clearly functioning, otherwise the LOW VOLT light wouldn't have extinguished. The battery and charging systems were working fine.
The voltammeter was lying. But why?
Unconsciously and instinctively, I'd switched from "pilot mode" to "mechanic mode" in which my focus was verifying and isolating the nature of the problem.
I checked the other three positions of the voltammeter switch. In the "BAT" and "R ALT" positions, the meter read zero. Those readings didn't make much sense, either, since the battery should be charging. In the "L ALT" position, the meter read a bit over 10 amps, which seemed about right under the circumstances.
I switched the left alternator field switch off. The LOW VOLT light remained off, proving that the right alternator had picked up the full load and was functioning fine. With the meter switch in L ALT position, the meter reading dropped to zero, as expected. But the meter readings in R ALT and BAT positions were also zero, which made no sense. And the VOLTS reading remained 18 volts, which was clearly incorrect.
Switching back to "pilot mode," I decided that we could safely and reasonably proceed with our flight to the Bay Area. (Had it been IMC, my decision might well have been different.)
That was the good news. The bad news was that I'd be faced with a non-trivial troubleshooting problem once we returned.
There are a number of different approaches to troubleshooting. Each has its place.
In the Trial-and-Error (or "Shotgun") approach, the technician replaces one component of the faulty system after another until the problem disappears. This approach often makes good sense in airline and commercial fleet operations where downtime must be minimized at any cost, and replacement parts are readily available off-the-shelf. In the case of owner-flown GA aircraft, however, a little extra downtime is often more tolerable than a sky-high repair bill. While trial-and-error is occasionally unavoidable, I've long felt that there's way too much unnecessary parts swapping going on in GA maintenance shops.
Some mechanics believe it's easier for owners to accept paying for replacement parts than for troubleshooting labor, and they act accordingly. I don't know about you, but it absolutely infuriates me to spend good money on a costly replacement part that turns out not to fix the problem.
In another approach, which I'll call Educated Guesswork, the technician relies on experience to make an educated guess about what's wrong. "I've seen these spurious overvoltage trips many times before, and I've found that usually they're due to a bad regulator -- so I replaced your regulator and I'm guessing it'll solve the problem." If the mechanic guessed right, the owner is happy. If not, the airplane returns to the shop for another guess -- and the strategy degenerates into trial-and-error on the installment plan. (After three or four iterations that don't succeed in resolving the problem, the exasperated owner often calls someone like me for help.)
Educated guesswork is OK if the parts replaced are cheap and easy to change. But in aviation, "cheap part" is an oxymoron, and even a $15.00 sparkplug costs between $50.00 and $100.00 in labor to replace by the time you include de-cowling and re-cowling the engine. Personally, before I spend a couple of hundred bucks to replace a part on my airplane, I'd like to be pretty darn sure that the part actually needs replacing. Maybe that's just me.
The Diagnostic Approach
Consequently, i've become something of a nut on the subject of the Diagnostic Approach to troubleshooting, whose mantra is "don't try to fix a problem until you're sure you know its cause." An obvious corollary is "never replace a costly part until you've proven the part is faulty." Under this philosophy, the maintenance process is divided into two distinct phases -- diagnosis and correction -- performed strictly in that order.
The tools used in the diagnostic phase are not screwdrivers, pliers and wrenches, but rather service and parts manuals, schematic diagrams, test instruments, and brains ... especially brains. Good diagnosis consists overwhelmingly of headwork, not handwork. It's cerebral, not physical.
In a perfect world, doctors and aviation maintenance technicians would be both brilliant diagnosticians and skilled therapists. In real life, this isn't always (or even often) the case. The best person to diagnose a problem with your aircraft isn't always the same person who fixes the problem. (As a tech rep for Cessna Pilots Association and a frequent speaker on aircraft maintenance topics, I seldom get to swing wrenches on other owners' airplanes; but I assist many hundreds of owners with problem diagnosis each year.) To my way of thinking, owners need to get actively involved in the diagnosis of problems that arise with their aircraft, even if they themselves never do any wrench swinging.
Good problem diagnosis is a blend of science and art, logic and judgment, deduction and intuition. To some extent, troubleshooting skills are universal: If you're skilled at troubleshooting aircraft problems, you can probably do a decent job troubleshooting problems with clocks, lawnmowers, computers or plumbing. The best troubleshooters are the ones who have a deep understanding of the systems they're troubleshooting, but you can usually compensate for lack of experience through diligent study ... if you don't know the answer, look it up or ask somebody who does.
Identify, Verify, Reproduce
The first step in troubleshooting any problem is to identify the problem and verify that what you think is wrong is actually wrong. This may seem so obvious as to be hardly worth mentioning, but it's amazing how often this essential step is omitted.
The tech support call from the Cessna 340A owner I talked about earlier is a perfect example. The owner told his mechanic that there was a problem with the voltammeter, and the mechanic proceeded to try to fix the voltammeter system (by replacing most of the parts in that system at great expense) without ever verifying that the system actually had a problem. In the end, it turned out the voltammeter system had not been faulty at all, and the efforts to fix it were completely wasted.
Similarly, the problem I had with my T310R that morning in late December appeared to me at first to be a nearly dead battery. Had I not been a compulsive troubleshooter, I might well have simply brought the plane into the shop and told the mechanic "I had to scrub this morning's flight because the battery voltage measures only 16 volts, even though I flew the plane just a week ago and didn't leave anything on." It's entirely possible that the mechanic would then have taken me at my word and replaced the battery. A few days later, I'd have gone to the shop to pay the $350.00 repair bill and pick up my airplane. When I climbed into the pilot's seat and turned on the battery switch, the meter would have read 16 volts and I'd have been one unhappy camper.
One good way to help avoid problems like this is to make a habit of showing problems to your mechanic rather than just telling him about them. When you show the symptoms to your A&P, maybe he'll draw different conclusions from them than you did. If so, take a few minutes to discuss the situation. See if you can come to agreement on what the problem is before you hand over the keys and sign the work order.
If you can't make the problem happen for your mechanic (isn't that always the way it goes?) you should probably think twice before putting your aircraft in the shop. It's generally unreasonable to expect a mechanic to fix a problem he can't reproduce. But he'll probably try anyway, in an attempt to keep you happy (mechanics hate to admit that they can't fix something), and you'll wind up with the invoice.
Gather Fault-Isolation Data
Once the problem has been identified and verified, it's time to start collecting data that will hopefully lead to a diagnosis. Try to narrow the possible causes by testing all possible modes or control combinations that might affect the system in question. Change one thing at a time and see what the effect is (if any) on the problem. Make a detailed record of everything you do so you can go over it later.
Suppose you start hearing a strange noise on your comm radio. Here are some of the fault-isolation questions you might try to answer:
- Does it occur on all comm frequencies or just certain ones?
- Does it occur on both comm radios or just one? Does it affect the nav radios or just the comms?
- Does the noise change as you change engine RPM?
- Does it go away if you shut off an alternator?
- How about if you shut off a magneto?
- How about if you shut off the strobe?
- Can you hear it through the cabin speaker, or only through your headset?
- Do you have another headset you can try?
And so on.
Once you've tried everything you can think of to isolate the problem, it's time to develop a theory -- or perhaps several alternate theories -- about what kind of component failure(s) could be responsible for producing the observed symptoms. Here's where the serious headwork begins.
At this stage, it's absolutely essential to have a thorough understanding of how the system works and what components are involved. Unless you are sure you know the system cold, this is the time to read the service manual and study the schematic diagrams. If you're still unclear about exactly how the system works, talk to an expert (perhaps your mechanic, the manufacturer, or your type-club tech rep) until you're convinced you understand.
Now consider all the possible components in the system that might have failed. Would a failure of that component produce the symptoms you're seeing? What sort of test could be performed to determine whether or not that component was the culprit?
|C310 Electrical Diagram (click for larger version)|
Consider my voltammeter problem, for example. My review of the schematic showed that the circuit included three shunts, six fuses, a wafer switch, a resistor, a meter movement, and a bunch of wiring. Since the meter seemed to work properly in the L ALT position, I concluded that the meter movement itself was probably OK. The inaccurate voltage readings could conceivably have been caused by a bad resistor, switch, fuse, or a flaky connection somewhere.
Ah, but which was it?
Decide Which To Try First
The science of diagnostic troubleshooting consists of gathering data, analyzing the evidence, studying the system, and coming up with a list of theories that could account for the observed symptoms. The next step is deciding which theory to test first, and that's where the art of troubleshooting begins, and where your experience, judgment and intuition come into play.
If one of your theories seems substantially more likely than the others, that might be the best place to start. Common sense often dictates which theory is most likely. In my T310R charging system, for example, an alternator is more likely to fail than a regulator because:
- The alternator has moving parts, while the regulator is solid-state;
- The alternator is located in the hostile, hot, high-vibration environment of the engine compartment, while the regulator is inside the cabin under the copilot's seat;
- Experience shows 100-amp alternators have a mean time between failure (MTBF) of around 1,000 hours, while solid-state regulators often last for the life of the aircraft.
Similarly, turbocharging problems are more likely to be caused by a faulty wastegate than a faulty controller, because the wastegate spends its life bathed in hot, corrosive 1600°F exhaust gas, while the controller never gets hotter than 200°F.
Alternatively, if one of your theories is a lot easier to test than the others, you might want to start there. If my LOW VOLT annunciator stopped working, the first thing I'd check would be the lamp -- not because it's the most probable failure point, but because it's so darn easy to test.
In the case of my voltammeter problem, I concluded that the meter movement, selector switch and resistor would be hard to access -- the main avionics stack would have to come out to get at them. On the other hand, the fuses and shunts and some of the wiring could be accessed quite easily by pulling a few wing root fairings and an under-wing inspection plate. So it was pretty much a no-brainer where to start.
Occasionally, you'll encounter a situation where the only practical way to test your theory is to replace the suspected component and see if the problem is resolved. In such cases, all other things being equal, it usually makes sense to try replacing the cheapest component first ... or perhaps the component that you have in stock. Once again, use common sense to decide where to start.
Test The Theory
You've developed theories of what failures could account for the observed symptoms, and chosen one such theory as the most probable, or the easiest or cheapest to test. The next step is to test your chosen theory and see whether or not it pans out.
Your test procedure depends on what it is you're testing. If it's an electrical problem, you'll probably use your digital multimeter, or possibly something more exotic like an oscilloscope. If it's an internal engine problem, you'll probably use a compression tester or borescope. If it's a fuel system problem, you'll probably use pressure gauges. If it's a landing gear problem, you might need jacks, a spring scale, or a flashlight and mirror. And so forth.
Although it's hard to generalize about test procedures, a couple of techniques are especially useful. One is what I call binary troubleshooting where you devise a test procedure that effectively divides the faulty system in half and establishes which half is faulty. The process is then repeated, dividing the faulty half in half and further localizing the problem, and so forth.
Suppose, for example, that an alternator stops working. A logical first test would be to connect a multimeter to the alternator field terminal and see whether voltage is present. If there's no field voltage, then the problem must be in the field circuit -- regulator, overvoltage relay, field switch, field breaker/fuse, or field wiring. One or two more multimeter readings is usually enough to isolate the faulty component. On the other hand, if voltage is present at the alternator field terminal, then the problem is with the alternator or the alternator output breaker or wiring. Again, one or two more multimeter checks will usually nail the culprit.
Another technique that's often useful is swapping components or connections and seeing whether or not the problem switches sides. This is especially helpful in twin-engine aircraft where many systems and components are duplicated.
Suppose your left fuel gauge starts acting up but you can't tell whether it's the sending unit or gauge that's at fault. Try swapping the connections to the two fuel gauges, or swapping the sending units. (Or in a capacitive gauging system, try swapping signal conditioners.) If the problem switches sides, you've proven that one of the components you swapped is at fault. If it doesn't, you've ruled out the swapped component as the culprit.
Cause or Symptom?
Before you conclude you've found and fixed the problem, make sure you've corrected the cause, not just eliminated a symptom.
Suppose you were troubleshooting an alternator failure and discovered that the field fuse had failed. It would be logical to replace the bad fuse. But if that same fuse failed again a month later, then chances are that you have an intermittent short somewhere in the field circuit that needs to be located and corrected.
Some few years back, a Cessna Skymaster based at my home airport suffered a failure of the front-engine vacuum pump. The pump was replaced with a new one, but the new pump failed less than 100 hours later. This time, the shop took great pains to blow out the vacuum hoses, replace the central vacuum filter and vacuum regulator filters, and installed another new vacuum pump. This time, the pump failed after less than 25 hours. Clearly, something was causing the pumps to fail prematurely. Further investigation showed that the right-angle accessory drive adapter contained a bevel gear that had shed a tooth, resulting in a jerky drive that had been causing the vacuum pumps to fail. Unfortunately, by the time this was discovered, metal from the failed gear had contaminated the engine, and a premature major overhaul was required.
I once heard from the owner of a Cessna 150 that developed starter problems. The plane went into the shop, and the mechanic replaced the O-200 engine's starter adapter with a PMA replacement. During a post-maintenance test run, the starter adapter failed once again. Once again, the shop replaced the starter adapter with another PMA unit (under warranty). This time, when the engine was tested, it suddenly stopped with a loud "bang" with the crankshaft frozen in place. The shop found that the crankshaft drive gear had broken, necessitating an engine teardown.
The moral is clear. If you find a faulty part, replace it, and the new part fails shortly thereafter, don't simply replace the part again without investigating why the part failed.
Don't Overlook The Obvious
Look for obvious things first. Has anything changed recently? Could the problem possibly be your fault?
I recall when my T310R came out of the radio shop after a major avionics installation. (I had a Garmin GPSMAP 530 and a Shadin Digiflo installed.) The owner of the radio shop accompanied me on the initial post-installation test flight, and we put the new radios through their paces. Everything appeared to work perfectly. I was delighted.
A week later, I was flying back to home base from northern California. As the sun set, I turned on the nav and cockpit lights. Soon, I realized that none of the lights on the radio dimmer were working, including the primary altimeter. I wound up shooting a night ILS to 500 feet with a penlight in my mouth.
The next morning, I dropped off the plane at the radio shop with a note explaining that the radio lighting hadn't been hooked up correctly and asking that it be fixed. Later that afternoon, I got a call from the shop owner asking me to come by and pick up the plane.
"Wow, you got it fixed that quickly?"
"There was nothing to fix," replied the shop owner. He explained that the only reason that the radio lights didn't come on was because the "day/night" toggle switch on the panel was set in the "day" position.
I never use the day/night switch, so it simply didn't dawn on me to check it. Since the airplane had just come out of a major avionics installation, I simply assumed that the problem was caused by a wiring problem, and never even attempted to isolate the problem. Boy, did I ever feel dumb!
When I stopped by the radio shop to pick up the plane, I could tell from the looks on various faces that the whole crew knew about my goof and had a good laugh over it. They were too polite to tease me about it, but nonetheless I felt like a complete idiot!
Dealing With Intermittents
Intermittent problems are the toughest to troubleshoot. You encounter a problem, but by the time you get your plane to the shop, the problem is gone. As I pointed out earlier, you can't reasonably expect your mechanic to fix a problem that he can't reproduce.
If you ask your mechanic to fix such a problem, about the best he can do is to use a "shotgun" approach: Make a guess at where the problem might lie, replace the suspected component, and hope the problem goes away. If it doesn't, he'll take another guess and replace some other component. (Traditionally, the first component to be replaced is the most expensive one.)
I've found that the best strategy for dealing with such intermittent problems is simply to sit tight, wait awhile, and see what happens. One of two things is likely to happen: (1) The problem will get worse; or (2) the problem will go away.
If the problem gets worse, it's a lot more likely to be reproducible and therefore amenable to troubleshooting. If the problem goes away, it's probably not worth worrying about (although it's undeniably aggravating that the cause remains unknown).
This "sit tight and wait awhile" philosophy has served me well over the years, but it's appropriate only when dealing with problems that are clearly not life-threatening. That's certainly the case with most intermittent problems. But if the intermittent you observe is a major fluctuation in engine oil pressure for 30 seconds, after which everything returns to normal, I would definitely not recommend taking the "sit tight and wait awhile" approach. Prudence would demand landing at the first opportunity and inspecting the engine oil filter and propeller governor gasket screen for the presence of metal. Transient oil pressure fluctuations are most often caused by a chunk of something getting caught in the oil pressure relief valve and interfering with its ability to regulate oil pressure, and must be considered a Very Bad Thing.
Aircraft owners should make every effort to troubleshoot a problem -- or at least verify and isolate it -- before putting the aircraft in the shop. If you can reproduce the problem on the ground, arrange to do so with your mechanic looking on (show him the problem!) so that he sees exactly what you're talking about.
If the problem is one that only occurs in flight, it's especially important to take the fault-isolation process as far as you can, and to document your results in detail. For this kind of problem, the data you provide your mechanic may be all he has to go on in developing his troubleshooting strategy.
In any case, urge your mechanic to spend whatever time and effort is required to diagnose the problem properly before attempting a fix. A good way to discourage shotgunning is to ask your mechanic to call you for approval before replacing any part that costs more than $200 (or whatever figure you're comfortable with). If he calls to say that he wants to install a new frammis, ask him to explain what testing he's done to be certain that your old frammis is actually defective. If you're not satisfied with his answer to that question, you might want to seek an expert second opinion.
My Voltammeter Problem
I almost forgot to tell you how my problem turned out. Because the meter, switch and resistor looked like they'd be a real bear to get to, I decided to start with the battery shunt and fuse block, which I knew were located in the left wing behind the battery box and easily accessible via an underwing inspection panel. (Test the easiest theory first.) Visual inspection with a flashlight and mirror indicated that the fuses were not blown and the wiring looked fine. However, further multimeter tests revealed that with the aircraft powered up, there was a voltage drop across one of the fuseholders. The fuse end-caps were tarnished in color, and the fuseholder clips did not seem to be gripping the fuses very tightly.
I removed the fuses from the fuse block, bent the fuseholder clips so they'd grip tighter, installed new fuses with shiny new end-caps (50 cents apiece at Radio Shack), and applied some aerosol Corrosion X just for good measure. I then climbed into the cockpit and flipped on the battery switch. Sure enough, the voltmeter now read 24 volts, and the battery ammeter showed a small discharge that increased when I turned on the pitot heat and taxi light.
Just for good measure, I removed the wing root fairings and gave the same treatment to the fuses adjacent to the alternator shunts. I figured that after more than 20 years, those fuses were fully depreciated and deserved replacement on general principles.
See you next month.
Want to read more from Mike Busch? Check out the rest of his Savvy Aviator columns.