Firearms examiners suffer from what might be called “Sherlock Holmes Syndrome.” They claim they can “match” a cartridge case or bullet to a specific gun, and thus solve a case. Science is not on their side, however. Few studies of firearms exist and those that do indicate that examiners cannot reliably determine whether bullets or cartridges were fired by a particular gun. Firearms identification, like all purportedly scientific proof, must adhere to consistent and evidence-based standards. Fundamental justice requires no less. Absent such standards, the likelihood of convicting the innocent—and thus letting the guilty go free—is too great. It is perhaps this realization that has led courts to slowly start taking notice and restrict firearms testimony.
In the courts, firearms examiners present themselves as experts. Indeed, they do possess the expertise of a practitioner in the application of forensic techniques, much as a physician is a practitioner of medical tools such as drugs or vaccines. But there is a key distinction between this form of expertise and that of a researcher, who is professionally trained in experimental design, statistics and the scientific method; who manipulates inputs and measures outputs to confirm that the techniques are valid. Both forms of expertise have value, but for different purposes. If you need a COVID vaccine, the nurse has the right form of expertise. By contrast, if you want to know whether the vaccine is effective, you don’t ask the nurse; you ask research scientists who understand how it was created and tested.
Unfortunately, courts have rarely heard testimony from classically trained research scientists who could verify claims made by firearms examiners and explain basic principles and methods of science. Only research scientists have the wherewithal to counter the claims of practitioner-experts. What are needed are anti-expert experts. Such experts are now appearing more and more in courts across the country, and we count ourselves proudly among this group.
Skepticism of firearms identification is not new. A 2009 National Research Council (NRC) report criticized the firearms identification field as lacking “a precisely defined process.” Guidelines from the Association of Firearm and Tool Mark Examiners (AFTE) allow examiners to declare a match between a bullet or cartridge case and a particular firearm “when the unique surface contours of two toolmarks are in ‘sufficient agreement.’” According to the guidelines, sufficient agreement is the condition in which the comparison “exceeds the best agreement demonstrated between tool marks known to have been produced by different tools and is consistent with the agreement demonstrated by tool marks known to have been produced by the same tool.” In other words, the criterion for a life-shaping decision is based not on quantitative standards but on the examiner’s subjective experience.
A 2016 report by the President’s Council of Advisers on Science and Technology (PCAST) echoed the NRC’s conclusion that the firearms identification process is “circular,” and it described the sort of empirical studies required to test the validity of firearms identification. At that time, only one appropriately designed study had been completed, carried out by the Ames Laboratory of the Department of Energy, colloquially called “Ames I.” PCAST concluded that more than a single appropriately designed study was necessary to validate the field of firearm examination, and it called for additional studies to be conducted.
The NRC and PCAST reports were attacked vigorously by firearms examiners. Although the reports per se had little impact on judicial rulings, they did inspire additional tests of firearms identification accuracy. These studies report amazingly low error rates, typically around 1 percent or less, which emboldens examiners to testify that their methodology is nearly infallible. But how the studies arrive at these error rates is dubious and without anti-expert experts to explain why these studies are flawed, courts and juries can and have been bamboozled into accepting specious claims.
In fieldwork, firearms examiners generally reach one of three categorical conclusions: the bullets are from the same source, called “identification,” a different source, called “elimination,” or “inconclusive,” which is used when the examiner feels the quality of the sample is insufficient for identification or elimination. While this “I don’t know” category makes sense in fieldwork, the clandestine way it has been treated in validation studies—and presented in court—is flawed and seriously misleading.
The problem arises in regard to how to classify an “inconclusive” response in the research. Unlike fieldwork, researchers studying firearms identification in laboratory settings create the bullets and cartridge cases to use in their studies. Hence, they know whether comparisons came from the same gun or a different gun. They know “ground truth.” Like a true/false exam, there are only two answers in these research studies; “I don’t know” or “inconclusive” is not one of them.
Existing studies, however, count inconclusive responses as correct (i.e., “not errors”) without any explanation or justification. These inconclusive responses have a huge impact on the reported error rates. In the Ames I study, for example, the researchers reported a false positive error rate of 1 percent. But here’s how they got to that: of the 2,178 comparisons they made between nonmatching cartridge cases, 65 percent of the comparisons were correctly called “eliminations.” The other 34 percent of the comparisons were called “inconclusive”, but instead of keeping them as their own category, the researchers lumped them in with eliminations, leaving 1 percent as what they called their false-positive rate. If, however, those inconclusive responses are errors, then the error rate would be 35 percent. Seven years later, the Ames Laboratory conducted another study, known as Ames II, using the same methodology and reported false positive error rates for bullet and cartridge case comparisons of less than 1 percent. However, when calling inconclusive responses as incorrect instead of correct, the overall error rate skyrockets to 52 percent.
The most telling findings came from subsequent phases of the Ames II study in which researchers sent the same items back to the same examiner to re-evaluate and then to different examiners to see whether results could be repeated by the same examiner or reproduced by another. The findings were shocking: The same examiner looking at the same bullets a second time reached the same conclusion only two thirds of the time. Different examiners looking at the same bullets reached the same conclusion less than one third of the time. So much for getting a second opinion! And yet firearms examiners continue to appear in court claiming that studies of firearms identification demonstrate an exceedingly low error rate.
The English biologist Thomas Huxley famously said that “Science is nothing but trained and organized common sense.” In most contexts, judges display an uncommon degree of common sense. However, when it comes to translating science for courtroom use, judges need the help of scientists. But this help must come not just in the form of scientific reports and published articles. Scientists are needed in the courtroom, and one way to do this is to serve as an anti-expert expert.
This is an opinion and analysis article, and the views expressed by the author or authors are not necessarily those of Scientific American.