The Pilot's Lounge #70: Do Humans Create Those Written Tests?
Every pilot has taken them -- and most have noticed strange or at least confusing questions on those FAA Knowledge Tests. Sometimes it seems like nobody official has even looked at those questions in decades to decide if the questions are relevant. AVweb's Rick Durden met the folks who actually are updating those old tests.
Just recently, one of the regulars here at the pilot's lounge of the virtual airport, Ed Williams, who in the real world is a plasma physicist, took a break in studying for the ATP written, stopped by and showed us one of the published questions on that written. It said: "Who should be advised if there is a suspected radioactive contamination involving a radioactive materials shipment and it is determined that radiological advice is needed?" The "correct" answer is: "The U.S. Energy Research and Development Agency." However, because of Ed's background, he knew that the ERDA was abolished in 1977. Its tasks were handed to the U.S. Department of Energy. Interestingly enough, 49 CFR 175.700 refers to the DOE. So, the question has survived with no correct answer for some 27 years.
The FAA airman written tests (yes, they are all administered on a computer now, but the term "written" sticks around, just as people call the orange colored cockpit voice and data recorders "black boxes") have long been the subject of discussion and debate in the aviation community; I don't know how many times I've gone through CFI refresher courses and been involved in debates over one of the some 11,000 questions that are on the various written exams administered by the FAA for airman certificates and ratings. The discussions have ranged from the nuances of word meanings through heated arguments in which proponents of various points of view pull out the AIM, Advisory Circulars and other documents they claim support their position. And some instructors keep lists of specific written exam questions that are either out of date or wrong so that they can prepare their students for those particular questions.
Nevertheless, for years I have worked with people seeking various ratings -- have taught the subjects and topics I felt were appropriate for them to learn so that they could safely do things they desired to do with aircraft, have sent them off to take the written exams -- and I have never felt that the overall tests were unfair or didn't serve as a pretty accurate reflection of the students' levels of understanding of the subject. I never observed an excellent student fail an exam, although I think one or five may have gotten confused in coloring in the bubbles back in the No. 2 pencil days; and I never saw a student that I thought would score about a 75% come out with a truly high result. I also heard stories about the test creators agonizing over the selection of one word in one question, trying to make sure it was meaningful to the broadest cross-section of the population as possible, and of doing computation problems on every type of commercially available flight computer or calculating device to make sure that no matter what the applicant used, the result would be accurate. Finally, I've always felt that creating a test is a cross between science and art, with a soupçon of sadism tossed in for flavor, so I've wondered who writes those tests out there in the windy flatlands of Oklahoma.
Trying To Make Changes
I had the very good fortune to be invited to travel to Oklahoma City and attend the annual Airman Computer Knowledge Testing Program Review in December 2003 and meet not only the employees of the FAA who work for the Airman Testing Standards Branch, which is identified as AFS-630, but also a significant number of the heavyweights in the aviation teaching biz, from Irv Gleim and Martha and John King through representatives from several of the universities and aircraft manufacturers that have aviation training programs. This was the second time one of these reviews had been held and was a part of what appeared to me to be a serious outreach effort by the FAA to get feedback from aviation educators about the entire airman-testing program. Virtually all of the employees of AFS-630 were present at the Review. Some gave presentations on specific topics that lead to very spirited discussions, plus there was time for us civilians to meet one-on-one with the specific individuals at AFS-630 who were responsible for the various certification areas such as A & P Mechanic, Instrument Rating, Private Pilot and so forth. What I observed, and I'll talk about it in a little more detail below, was a willingness of the AFS-630 people to listen to what industry and academia had to say; that those in academia and industry did not always speak with one voice; that the various points of view were encouraged by the FAA; and that the schedule allowed enough time to explore particular areas of concern in considerable detail.
That is not to say consensus was reached on issues, but I did not see the FAA go into any sort of defensive mode or exhibit some sort of pride of authorship that discouraged contrary points of view. I also learned that the FAA had made changes in the testing program as a result of the input of industry at the Review a year ago.
Into the Details
The Review began with an exposition of the basics of how the written tests are created and a discussion of the database that exists to support every question. Input for specific questions comes from a variety of sources within both the FAA and NTSB as well as from industry, the European JAA and ICAO, and there is a fairly involved vetting process within the FAA before a question is approved for use in a test. Airman testing was switched over to a computer-based operation some years ago so that all applicants take their tests at a center that is approved by the FAA to administer the knowledge tests. Each test is compiled with questions from specific knowledge areas although the technology does not yet allow the tests to be randomly generated from the pool of available questions. In addition to answering the questions that are a part of the knowledge test the applicant provides a certain amount of personal information, and is asked two or three additional knowledge questions that are "developmental", that is, being considered as questions to be used in the test. Those questions have not been through the shakedown process to be on the knowledge test yet, but are being analyzed, and in that process, the FAA looks at how well the applicants do with them. The test instructions state that those questions are not used in the calculation of the applicant's grade, although a certain percentage of applicants always write in some remarks about having received a test that had too many questions in it. Yes, there is a remarks section for the applicant to make whatever comments he or she feels are appropriate after taking the test. I was interested to learn that at least one person at the FAA reads every single remark made and has used those remarks in changing tests and in the creation of new questions. Naturally, some of the remarks are pretty amazing and are shared (none of the remarks is considered secret). Sadly, often the remarks indicate the applicant did not read or understand the instructions or some of the knowledge areas on the test.
The FAA has an extensive database of books, articles, advisory circulars and other documents from which it constructs its questions. My understanding was that each question and answer on a written examination has to have suitable documentation to prove its accuracy at the time it is approved for inclusion in a test. Thus, anytime a question and answer are challenged, the FAA can point to a publication (usually by the FAA) that supports it. That process is not perfect, errors occur and questions also become out of date.
The FAA currently gives three options for the correct answer for each question. After the FAA quit using the true-false tests of the late 1940-'50s, and instituted multiple-choice questions, there were four options for each question. According to FAA personnel, in the 1990s, their testing found that there was no significant difference in the rate at which questions were missed when there were three or four options for the answer, and the time it took to create a fourth option for all 11,000 of the questions for the various written examinations simply was not cost effective.
The FAA also is tasked with overseeing the hundreds of approved testing sites within the country to see that the tests are administered fairly, following specific requirements which include everything from the time allowed for the applicant to take the test to the minimum conditions for the room, so as to keep applicants from having to take a test sitting under a leaking air conditioner while turbine engines are run up on the other side of the wall (it seems that has happened).
Learn For The Test, Or Test The Learning?
The announced goal of the computerized airman-testing program is to turn out pilots who can operate safely in the national airspace system. AFS-630 personnel admit that a written test cannot do all that is required to meet this goal. The practical test (oral and flight) is a necessary adjunct because there is no way that a written test can ever measure all that is required to be a pilot. However, the personnel at AFS-630 explained that it was their goal to write test questions that, to the extent possible, were targeted at the appropriate skill level for the rating in question, and required that the applicant use her or his reasoning and aeronautical knowledge to determine the correct answer. I observed a unanimous approach to the testing procedure within the room, from academia, industry and the FAA, to try to avoid questions that tested rote memorization and work toward a goal of questions that required reasoning to apply what the applicant knew to real world scenarios. Naturally, that brought out concern over the fact that the FAA publishes all of the questions and answers, and identifies the correct answers. Therefore, what is to stop applicants from simply memorizing the questions and answers without truly learning? The concern for memorizing test questions, in a nutshell, was the dilemma facing the FAA, academia and industry, and was the basis for all of the disagreements I observed. (Interestingly, the disagreements did not pit the FAA on one side, versus academia and industry on the other, but ignored that line completely and reflected a split in fundamental points of view of very educated people, regardless of their employer, who are in the business of teaching and testing pilots as to the best way of going about those tasks.)
Applicants do memorize the questions and answers. I learned that some walk in and take a computerized written in fewer than ten minutes and ace it, a fact that astounded and concerned me. I also learned that at the first Review, held a year ago, someone in industry suggested randomly changing the order that the three possible answers to each question were presented so that a person who memorized that the answer to the question on VOR receiver off flags was "B" would actually have to read the possible answers. Because the tests are computer generated, that could be, and was, done. The results were immediate, with average test scores going down. Of course, a certain percentage of applicants expressed great dudgeon in their remarks that the test they got was all messed up because the answers to the questions were not in the correct order and they had to take all of this time to read each one. When those remarks were read, I detected no sympathy from anyone in the room.
Another subject that came up was the wording of questions. The FAA personnel were extremely aware of cultural differences within the U.S. in terms of shades of meaning of words and word order. On top of that, they were also aware of the fact that a definite proportion of those who take the writtens do not speak English as their first language, and despite the FAR requiring one to read, speak and understand English; word meanings are by no means universal. What I saw, and heard about in private conversations, was a constant effort of the part of the AFS-630 personnel to make questions and answers that are as clear, understandable and unambiguous as possible. Nevertheless, they were aware that the process isn't perfect. Therefore, they also rely on the vast amount of information that is available from the computer analysis of test results to help identify questions that have an unusually high rate of incorrect answers and pull them for review.
Several questions are prepared for each specific knowledge area that is felt to be appropriate for examination. Such a series of questions on a particular knowledge area are referred to as "parallel questions". Usually an applicant will only get on of any series of "parallel questions" on a test.
The FAA it has announced a policy under which it may only publish one of each series of "parallel" questions from specific knowledge areas. Thus, a fact situation may present an airport diagram and then have six or seven questions in which a particular runway is active and an aircraft sitting at some point on the airport is given a clearance to taxi to some other point and the applicant is asked whether the aircraft can taxi to that point or must hold short at some intermediate location. Because the specific knowledge area involves only a taxi clearance, the FAA would like to publish only one of the questions because there is a desire to require that the applicant be able to reason through such a situation even if he or she has never seen the particular question before.
Naturally, the idea of no longer publishing all of the test questions was not greeted with universal enthusiasm. It was interesting to hear both sides of the issue. The AFS-630 personnel wanted questions that required the applicants to think, and felt that by not handing them all of the questions on a platter, that goal would be better reached. After all, for many years the FAA did not publish any of the test questions and answers and somehow applicants struggled through. On the other hand, there is a certain level of distrust of our government as well as the belief that in a democracy that government of the people should be open to the people, so the test questions and answers should not be hidden. An undercurrent was that a certain percentage of the test questions simply have the wrong answer and it is necessary for the educators to teach those particular questions to the students who are paying to learn how to fly.
Not publishing all parallel questions incited the most intense discussions of the day. There were very strongly held opinions on both sides of the issue that were expressed in well reasoned, but impassioned fashion. One of the concerns expressed was that a question may not be truly "parallel" in that it actual inquires into more than one knowledge area, and thus might improperly be withheld. One person who commented felt that the only truly "parallel" questions are ones involving calculations and thus those are the only ones that should be subjected to withholding.
Secret Questions, Secret Answers
Political concerns played a role in the positions taken by various individuals. At lunch, when we were broken into small groups at individual tables, a number of industry representatives made comments and gave examples about the fact that the current administration in Washington was being more secretive in government dealings than any since that of President Nixon and they were observing such secrecy in some of the things the FAA was doing in rule-making. Therefore, even though AFS-630 seemed to be a shining exception to what was going on elsewhere, the same industry reps were decidedly uncomfortable with the idea of hiding any test questions.
At the end, I found myself leaning toward the full disclosure group. I am aware that some of the answers are just plain wrong or out of date, and therefore unless published, there is no hope, other than blind luck, that an applicant will get those correct. However, I also despise the idea of instructing for a test as, in my opinion it turns out a shallower, less well-rounded pilot. In a perfect world, if the FAA had adequate resources to get rid of all out of date questions and make sure all answers were correct, I would prefer that the questions not be disclosed ahead of time so that the test would be more valid in assuring the applicant really understood the material. But, this is not a perfect world. . .
The FAA apparently has an exemption from the Freedom of Information Act so that it is not required to disclose all written test questions. Nevertheless, going back to the bad old days where test questions were not published reminded me of the standing offer that was made by one of the aviation education outfits back in the late '60s in which it would pay cash to applicants for each verbatim question and alternative answers they memorized and brought out after taking the written. I hope we don't go back to that practice.
Interestingly, one of the solutions discussed was for the FAA to continue to publish the questions and answers, but not segregate them by the written test to which they apply, just set out all 11,000 of them in a big jumble. The more I thought about such a Purloined Letter approach, the more I liked it.
The parallel test question was the contentious issue of the seminar. What particularly struck me was that I found that I expected the AFS-630 personnel to go into a defensive mode and simply support the plan to not publish all of the questions, yet they did not. What I observed was that everyone from AFS-630 who got involved in the discussion was willing to listen to the comments and a couple of times responded that they had not considered some of the points that were raised. It was one of those rare times when bright people do their utmost to explore all of the ramifications of an issue in an attempt to find the best overall answer rather than try to set out some ideology. There were a couple of exchanges regarding technical concepts in education and testing that went well above my head but impressed me with the level of educational and testing expertise within AFS-630 and the aviation education business. My impression at the close of the rather extended discussion was that the FAA will probably go forward with the idea of only publishing one of the "parallel" questions in specific knowledge areas, however, the AFS-630 personnel are so aware of the issues involved that they will move with some care. If I were a betting person, I would bet that they initially institute the program only with calculation questions.
There were a number of references to out-of-date questions. One person said that there were too many ADF questions on the private written and that there should be some on GPS, because student pilots rarely experience ADFs. The response was complete agreement. AFS-630 personnel said they are working on GPS questions, but avionics technology is changing so rapidly that it has been difficult to create suitable questions.
One member of industry started a discussion when asking that an applicant be given the specific questions missed along with the test results rather than just referencing knowledge areas missed. AFS-630 had not had any history of requests for the specific questions missed, although from time to time applicants did call and want to talk about which question they might have missed and the certification managers did discuss them with applicants who called. This was a new topic that I expect will come up in the future because, as a flight instructor, it would help my review with the student if I knew the specific questions missed. Some of those who were present were also designated pilot examiners. They expressed support for knowing the specific questions applicants had missed, as it would be helpful in conducting the oral portion of the airman practical test. It was pointed out that if the policy of not publishing parallel questions takes effect and applicants are provided with the questions they miss, a cheating technique to obtain all the parallel questions would be to have applicants take the tests and intentionally give wrong answers to any questions that didn't look familiar.
That exchange lead to some interesting conversation with regard to cheating on tests. There was absolutely no sympathy expressed by anyone in the room for someone who would cheat on a test that involves safe operation in an environment as unforgiving as the sky. Some of the examples were pretty mind-boggling: The report of an airline captain who took a private written for his son caused exclamations of disbelief. The sheer ability to collect and correlate data because the tests are computerized has allowed the FAA to be more effective in catching cheaters.
Sifting Through the Data
Because of the massive amount of data collected, the FAA can tell things like whether graduates of Part 141 and 147 schools average higher test scores than those who aren't graduates (they don't, although not all school graduates identify themselves), and which questions are missed more than others and how many times the average applicant takes to pass a written (once). The data also shows is that there has been a substantial decrease in the number of tests administered in the past few years (on the order of 25%), something that should concern all of us. I asked whether any attempt had been made to correlate a person's score on the written exams and the chance of having an accident. I learned that there had been no attempt to cross-reference the accident reports versus the scores the pilots had received on writtens. Given that the purpose of the airman written test program is to develop pilots who are competent to operate in the national airspace system, I, for one, would like to see if there is any relationship between failing a written or just barely passing and eventually rolling an airplane into a ball.
Now, if someone will please provide AFS-630 with the money to carry out that research.
During the course of the Review, I came to the conclusion that AFS-630 is engaged in an on-going effort to reach out to industry and academia to get as much input as possible so that it can do its work effectively. I saw that all of the people involved, FAA and "civilian", had respect for the level of knowledge and experience each other had (for example, one of the AFS-630 certification managers was a charter pilot in piston twins, flew freight and finally, MD-11s for an airline before coming to the FAA) and so neither side felt threatened by the other. Thus, the technical discussions went on in an open atmosphere, even when there were sharp differences of opinion. The phone number of each AFS-630 certification manager was given to all participants so that if a subsequent decision proves unpopular, those managers will get an earful. It also meant that they were going to hear about specific test questions that were out of date or wrong and may speed up the process of culling those questions.
I hope that the Reviews are continued annually. Putting them on takes a lot of effort; however, from what I saw, the level at which information and points of view were exchanged made it extremely valuable for everyone who attended. In a time when government is becoming increasingly secretive, the openness of AFS-630 is an example that more agencies should follow.
By the way, as I was writing this column, I contacted AFS-630 about the ATP question at the beginning of it. It was removed from the pool of questions available for the ATP written immediately.
See you next month.