Judging Firearms Evidence

Firearms violence results in hundreds of thousands of criminal investigations each year. To try to identify a culprit, firearms examiners seek to link fired shell casings or bullets from crime scene evidence to a particular firearm. The underlying assumption is that firearms impart unique marks on bullets and cartridge cases, and that trained examiners can identify these marks to determine which were fired by the same gun. For over a hundred years, firearms examiners have testified that they can conclusively identify the source of a bullet or cartridge case. In recent years, however, research scientists have called into question the validity and reliability of such testimony. Judges largely did not view such testimony with increased skepticism after the Supreme Court set out standards for screening expert evidence in Daubert v. Merrell Dow Pharmaceuticals, Inc. Instead, the surge in judicial rulings came more than a decade later, particularly after reports by scientists shed light on limitations of the evidence.

In this Article, we detail over a century of case law and examine how judges have engaged with the changing practice and scientific understanding of firearms comparison evidence. We first describe how judges initially viewed firearms comparison evidence skeptically and thought jurors capable of making firearms comparisons themselves—without an expert. Next, judges embraced the testimony of experts who offered more specific and aggressive claims, and the work spread nationally. Finally, we explore the modern era of firearms case law and research. Judges increasingly express skepticism and adopt a range of approaches to limit in-court testimony by firearms examiners.

In December 2023, Rule 702 of the Federal Rules of Evidence was amended, for the first time in over twenty years, specifically due to the Rules Committee’s concern with the quality of federal rulings regarding forensic evidence, as well as the failure to engage with the ways that forensic experts express conclusions in court. There is perhaps no area in which judges, especially federal judges, have been more active than in the area of firearms evidence. Thus, the judging of firearms evidence has central significance for the direction that scientific evidence gatekeeping may take under the revised Rule 702 in federal, and then state courts. We conclude by examining lessons regarding the gradual judicial shift toward a more scientific approach. The more-than-a-century-long arc of judicial review of firearms evidence in the United States suggests that, over time, scientific research can displace tradition and precedent to improve the quality of justice.

INTRODUCTION

On November 11, 2016, a police officer recovered a forty-caliber Smith & Wesson cartridge casing from the scene of a homicide in Washington D.C.1See United States v. Tibbs, No. 2016 CF1 19431, 2019 D.C. Super. LEXIS 9, at *8 (D.C. Super. Ct. Sept. 5, 2019). A police officer reported seeing a person discarding a Smith & Wesson semiautomatic pistol shortly after the homicide occurred.2Id. Police sent a recovered cartridge casing to the crime lab where an examiner identified it—conclusively—“as having been fired” by the pistol recovered from the defendant,3Id. at *8–9. charged with first-degree murder.4Id. at *8. As the case approached trial, the defense challenged the admissibility of this proffered expert testimony, arguing it should be excluded because it was not the “product of reliable principles and methods.”5Id. at *12. One of the authors served as an expert in the case. See id. at *9. In other words, the method lacked “scientific validity.” After hearing from several experts and reviewing published studies, Washington D.C. Superior Court Associate Judge Edelman found that there was insufficient evidence that firearms examiners can reliably make an identification.6Id. at *3 (“According to the government’s proffer, this analysis permitted the examiner to identify the recovered firearm as the source of the cartridge casing collected from the scene.”). The judge ruled an expert could—at most—opine that “the recovered firearm cannot be excluded as the source of the cartridge casing found on the scene of the alleged shooting.”7Id. at *77 (emphasis added); see also id. at *2. As we will describe, this is a powerful new limit on firearms evidence, a field in which experts have confidently concluded for decades that one and only one firearm—to the exclusion of all other firearms in the world—can produce the ammunition found at a given crime scene.8See Brandon L. Garrett, Nicholas Scurich & William E. Crozier, Mock Jurors’ Evaluation of Firearm Examiner Testimony, 44 Law & Hum. Behav. 412, 413 (2020) (studying jury evaluation of firearm expert testimony and finding “cannot exclude” language to influence verdicts); infra Part II.

While this case represented just one trial judge’s ruling, it not only forms a part of a sea change in judicial review of firearms evidence, but also the local repercussions point to more fundamental problems in our criminal system. Consider a later case before Judge Edelman, this one with charges brought against two men for two killings involving firearms evidence. Prosecutors were understandably concerned.9Jack Moore, DC Judge Orders Forensic Lab to Turn Over Some Documents Sought by Prosecutors, WTOP News (Nov. 10, 2020, 2:34 PM), https://wtop.com/dc/2020/11/dc-judge-orders-forensic-lab-to-turn-over-some-documents-sought-by-prosecutors [https://perma.cc/L5X2-JGJG]. In this case, D.C.’s Metropolitan Crime Lab had reported that the same weapon fired the cartridge casings found at each crime scene.10Id. Perhaps because they feared that the judge might view the evidence with renewed skepticism, the prosecutors took an unusual step: they asked independent examiners to take a look at the evidence.11Id.

The independent experts definitively concluded that two different firearms were involved—the opposite of what the D.C. crime lab examiners had concluded.12Jack Moore & Megan Cloherty, ‘You Can Trust This Laboratory’: DC Crime Lab Director Responds to Scrutiny of Firearms Unit, WTOP News (Dec. 2, 2020, 4:24 AM), https://wtop.com/dc/2020
/12/you-can-trust-this-laboratory-dc-crime-lab-director-responds-to-scrutiny-of-firearms-unit [https://
perma.cc/X93R-SUP7].
Internally, the lab examiners reexamined the evidence and agreed the cartridges came from different weapons. After meeting with lab managers, however, they instead reported an altered finding of “inconclusive,” meaning that no conclusion could be reached.13Prosecution’s Praecipe at 3, United States v. McLeod, No. 2017-CF-19869 (D.C. Super. Ct. Mar. 22, 2021). The management notified the ANSI National Accreditation Board (“ANAB”), which accredited the lab, that an internal review resulted in an “inconclusive” finding, but the audit that followed found that the lab managers had acted to conceal the errors in the case.14See id. at 2–3 (“DFS management not only failed to properly address the conflicting results reported to the DFS by the USAO, but also engaged in actions to alter the results reached by the examiners assigned to conduct a reexamination of the evidence.”). In April 2020, ANAB suspended the lab’s accreditation, and as a result, the lab was shut down.15Keith L. Alexander, National Forensics Board Suspends D.C. Crime Lab’s Accreditation, Halting Analysis of Evidence, City Says, Wash. Post, (Apr. 3, 2021, 7:43 PM), https://www.washington
post.com/local/public-safety/dc-lab-forensic-evidence-accreditation/2021/04/03/723c4832-94aa-11eb-a74e-1f4cf89fd948_story.html [https://perma.cc/2YS5-Y6QG].
Prosecutors then opened a new probe into its firearms unit, the lab director resigned,16Paul Wagner, D.C. Crime Lab Under Investigation After Allegations of Wrongdoing, NBC News (Apr. 8, 2021, 8:40 PM), https://www.nbcwashington.com/news/local/dc-crime-lab-under-investi
gation-after-allegations-of-wrongdoing/2634489 [https://perma.cc/4NJ5-GP4K].
the lab disbanded, and the firearms unit remains closed as of this writing.17Jack Moore, D.C. Abruptly Disbands Crime Lab’s Firearms Unit, WTOP News (Sept. 16, 2021, 4:00 PM), https://wtop.com/dc/2021/09/dc-abruptly-disbands-crime-labs-firearms-unit [https://
perma.cc/C3YN-LCYJ]. It appears that in December 2023, the D.C. crime lab regained partial accreditation.  As of this writing, however, the firearms unit has not regained accreditation, and it remains closed. Mark Segraves, DC Forensic Crime Labs Regain Accreditation After Nearly 3 Years, NBC Wash. (Dec. 27, 2023, 1:25 PM), https://www.nbcwashington.com/news/local/dc-forensic-crime-labs-regain-accreditation-after-nearly-3-years/3501258 [https://perma.cc/U342-NCE5]; Ivy Lyons, DC Crime Lab Appears to Regain Partial Accreditation After Losing Ability to Process Evidence in 2021, WTOP News (Dec. 26, 2023, 3:11 PM), https://wtop.com/dc/2023/12/dc-crime-lab-regains-some-accreditation-3-years-after-losing-ability-to-process-evidence [https://perma.cc/2TGY-USKX].

This rapidly unfolding crisis began with a spot-check in a single case prompted by a judge asking a fundamental question: How often do firearms examiners get it right versus wrong? For decades, few judges asked the question, but as we detail in this Article, judges have become increasingly engaged with the underlying science and have transformed a backwater area of forensic evidence into a subject of complex litigation. Indeed, in no other area have judges engaged in such a detailed manner with the limits of the testimony expressed by examiners—making firearms evidence the most prominent testing ground for the 2023 amendments to the Federal Rules of Evidence, designed to tighten judicial review of experts more generally, but with a focus on forensic evidence more specifically.18Advisory Comm. on Rules of Prac. and Proc., June 2022 Agenda Book 891–93 (2022) [hereinafter 2022 Comm. on Rules of Prac. and Proc.]; Fed. R. Evid. 702 (2023 amendment).

Firearms examination is in great demand, with more than a hundred thousand requests for a forensic firearm examination each year in the United States.19See Matthew R. Durose, Andrea M. Burch, Kelly Walsh & Emily Tiry, Bureau of Just. Stats., NCJ 250151, Publicly Funded Forensic Crime Laboratories: Resources and Services, 2014 3 (2016). Firearms violence is a major problem in the United States—more than ten thousand homicides and almost five hundred thousand other crimes, such as robberies and assaults, are committed using firearms.20See Gun Violence in America, Nat’l Inst. of Just. (Feb. 26, 2019), https://www.nij.gov/
topics/crime/gun-violence/pages/welcome.aspx [https://perma.cc/4TXL-K3NC]; 2018 January-June Preliminary Semiannual Uniform Crime Report: Crime in the United States, FBI (2018), https://ucr.fbi.
gov/crime-in-the-u.s/2018/preliminary-report [https://perma.cc/VMU8-ZYSG].
When conducting these comparisons, examiners seek to link crime scene evidence—such as spent cartridge casings or bullets—with a firearm. These examiners assume that the manufacturing processes used to cut, drill, and grind a gun leaves distinct and identifiable markings on the gun’s barrel, breech face, firing pin, and other components. When the firearm discharges, those components in turn contact the ammunition and leave marks on it. Experts have long assumed, as we will describe, that firearms leave distinct toolmarks on ammunition.21See infra Section I.A. They believe that they can definitively link spent ammunition to a particular firearm using these toolmarks.22See id. And for over a hundred years, examiners have offered criminal trial testimony relying on this assumption.23See infra Part I.

In recent years, the consequences of the uncritical judicial acceptance of firearms comparison testimony have come into sharper focus. Indeed, we now know that firearms evidence played a central role in numerous high-profile wrongful convictions. In the 2014 per curiam opinion in Hinton v. Alabama, for example, the U.S. Supreme Court reversed a conviction due to the defense lawyer’s inadequate performance in failing to develop firearms evidence at a capital murder trial.24Hinton v. Alabama, 571 U.S. 263, 264 (2014). The central evidence was a State Department of Forensic Sciences examiner’s conclusion that six bullets were fired from the same gun: “[T]he revolver found at Hinton’s house.”25Id. at 265. The defense did not hire a competent and qualified expert, and the Court emphasized that “the only reasonable and available defense strategy require[d] consultation with experts or introduction of expert evidence.”26Id. at 273 (quoting Harrington v. Richter, 562 U.S. 86, 106 (2011)). Hinton was subsequently exonerated, and he commented: “I shouldn’t have [sat] on death row for thirty years . . . . All they had to do was to test the gun.”27Abby Phillip, Alabama Inmate Free After Three Decades on Death Row: How the Case Against Him Unraveled, Wash. Post (Apr. 3, 2015, 10:28 PM), https://www.washingtonpost.com/
news/morning-mix/wp/2015/04/03/how-the-case-against-anthony-hinton-on-death-row-for-30-years-unraveled [https://perma.cc/5QPA-4M83].

This Article presents the results of a comprehensive review of all judicial rulings in the United States concerning firearms comparison evidence. Our database of more than 300 judicial rulings is available as a resource online.28See Firearms Expert Evidence Database, Ctr. for Stats. and Applications in Forensic Evidence (2022), https://forensicstats.org/firearms-expert-evidence-database [https://perma.cc/LR4J-RLU4]. The database “ha[s] assembled reported decisions, chiefly by appellate courts, that discuss the admissibility of expert testimony regarding firearms comparison evidence.” Id. The database consists of written, published decisions (largely appellate opinions but also some trial rulings).29The cases that are included in this database were:

[G]athered using searches of the Westlaw legal database, across all fifty states and the federal government, with rulings dating back over one hundred years. Where possible, trial rulings were obtained, but generally these cases reflect reported, written decisions containing the keywords used, and therefore largely reflect appellate rulings. The cases are searchable across a range of characteristics, including basic information concerning the state, year, type of court, and parties, but also details concerning the basis of the rulings and the factors relied upon by each court. The database describes whether the ruling employed a Daubert or Frye standard, or a ruling regarding local rules of evidence, and what the result of that ruling was.

Id.
We describe the three-part story of the path of firearms evidence: (1) initial skepticism of a novel set of methods, then moving to; (2) national acceptance of increasingly powerfully stated conclusions regarding firearms; and finally (3) a surge in judicial opinions and skepticism of firearms comparison evidence that followed, not Daubert and the new reliability-focused standards for judicial review of scientific evidence, but rather a series of scathing reports by the scientific community calling into question the reliability of firearms evidence.

First, we describe how in the earliest cases, judges were actually quite skeptical of firearms comparison evidence, particularly when presented by self-styled experts, and often concluded that jurors were capable of making the comparisons themselves, without a need for expert testimony.30See infra Part I. However, particularly due to the influence of the flamboyant Major Calvin Goddard and his disciples, courts gradually embraced the firearms comparison evidence as the subject of expert testimony.31See infra Part I.

Second, we document how the claims made by experts became more specific and aggressive as the work spread nationally.32See infra Part I. Rather than simply describing a comparison between two sets of objects, firearms experts testified by making “uniqueness” claims: the theory that “no two firearms should produce the same microscopic features on bullets and cartridge cases such that they could be falsely identified as having been fired from the same firearm.”33Erich D. Smith, Cartridge Case and Bullet Comparison Validation Study with Firearms Submitted in Casework, 36 AFTE J. 130, 130 (2004) (quoted in United States v. Monteiro, 407 F. Supp. 2d 351, 361 (D. Mass. 2006)). By the 1960s, this expert testimony was offered and accepted across the country. Professional groups emerged and set standards for the field, which courts took note of. Written judicial opinions became quite uncommon, and any judicial skepticism was largely limited to more unusual applications of the methods rather than the underlying methodology itself.34See infra Part I.

Third, we explore the modern era of firearms case law and research, with increasingly intense judicial interest and written opinions on the topic in the last two decades.35See infra Part II. In 1993, the Supreme Court decided Daubert v. Merrell Dow Pharmaceuticals36Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579 (1993). and, along with its progeny and the revision to Federal Rule of Evidence 702 (“Rule 702”) and state-law analogues, judges now bear clearer and more rigorous gatekeeping responsibilities to assess the reliability of scientific evidence.37See generally, e.g., David L. Faigman, The Daubert Revolution and the Birth of Modernity: Managing Scientific Evidence in the Age of Science, 46 U.C. Davis L. Rev. 893 (2013). Accompanying this shift in the courts, by the late 1990s, experts premised testimony on a “theory of identification” set out by a professional association, the Association of Firearms and Tool Mark Examiners (“AFTE”).38See infra Part II. The AFTE instructs practitioners to use the phrase “source identification” to explain what they mean when they identify “sufficient agreement” of markings when examining bullets or cartridge cases.39What Is Firearm and Toolmark Identification?, The Ass’n of Firearm and Toolmark Examiners, https://afte.org/about-us/what-is-afte/what-is-firearm-and-tool-mark-identification [https://
perma.cc/XAU7-5Y4M].

In recent years, scientists have called into question the validity and reliability of this testimony—contributing to an explosion of judicial rulings. In a 2008 report, the National Academy of Sciences (“NAS”) found that “[t]he validity of the fundamental assumptions of uniqueness and reproducibility of firearms-related toolmarks has not yet been fully demonstrated.”40Nat’l Rsch. Council of the Nat’l Acads., Ballistic Imaging 81 (Daniel L. Cork et al. eds., 2008) [hereinafter 2008 NAS Report]. In its 2009 report, the NAS concluded “[s]ufficient studies have not been done to understand the reliability and repeatability of the methods.”41Nat’l Rsch. Council of the Nat’l Acads., Strengthening Forensic Science in the United States: A Path Forward 154 (2009) [hereinafter 2009 NAS Report]. The report also noted that “the lack of a precisely defined process . . . [that] does not even consider, let alone address, questions regarding variability, reliability, repeatability, or the number of correlations needed to achieve a given degree of confidence.”42Id. at 155. Judges have also raised concerns about the lack of specificity in the examination process. See, e.g., United States v. Green, 405 F. Supp. 2d 104, 114 (D. Mass. 2005) (stating the method is “either tautological or wholly subjective”); United States v. Shipp, 422 F. Supp. 3d 762, 779 (E.D.N.Y. 2019) (“[T]he sufficient agreement standard is circular and subjective.”). Over half of the judicial rulings that we identified have occurred since 2009, the year that the NAS issued its pathbreaking report. We detail dozens of opinions that have limited testimony of firearms experts in increasingly stringent ways.

Solidifying this trend, in 2016, the President’s Council of Advisors on Science and Technology (“PCAST”) reviewed in detail all of the firearm examiner studies that had been conducted to date,43President’s Council of Advisors on Sci. and Tech., Forensic Science in the Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods X (Sept. 2016) [hereinafter PCAST Report]. finding, with only one deemed appropriately designed, that “the current evidence falls short of the scientific criteria for foundational validity.”44Id. at 111. Most recently—beginning in the aforementioned 2019 case before Judge Edelman—scientists have testified about the research base of firearm examination.45David L. Faigman, Nicholas Scurich & Thomas D. Albright, The Field of Firearms Forensics is Flawed, Sci. Am. (May 25, 2022), https://www.scientificamerican.com/article/the-field-of-firearms-forensics-is-flawed [https://perma.cc/ZM4A-TLMQ]. These experts include psychologists, statisticians, and other academics with training in conducting science, rather than applying a forensic technique. As one judge put it, “[R]arely do the experts fall into such cognizable camps, forensic practitioners on one side and academic researchers on the other.”46People v. Ross, 129 N.Y.S.3d 629, 639 (N.Y. Sup. Ct. 2020).

The impact of these modern critiques on the admissibility of firearm examination has borne concrete results, but gradually. Comforted by more than a century of long-standing precedent, judges were slow to react to scientific concerns raised regarding firearms comparison evidence, even after the Daubert ruling. Yet in more recent years, as lawyers have increasingly litigated the findings of scientific reports and error rate studies, we have seen a dramatic rise in a judge’s willingness to engage with scientific limitations of the methods.47See infra Section II.D. That said, most judges have responded by imposing limits on how experts phrase conclusions in testimony, but we note there are reasons to doubt that this compromise solution will sufficiently inform lay jurors of the limits of the method.48Regarding effectiveness of such measures, see Garrett et al., supra note 8, at 421–22. For further discussion, see infra Part III.

For the first time since 2000, Federal Rule of Evidence 702 was amended, as of December 1, 2023.492022 Comm. on Rules of Prac. and Proc., supra note 18, at 891–93. The Advisory Committee notes emphasize that these revisions are “especially pertinent” to forensic evidence.50Memorandum from the chair of the Committee on Rules of Practice and Procedure to the clerk of the Supreme Court 227 (Oct. 19, 2022), https://www.uscourts.gov/sites/default/files/2022_scotus_
package_0.pdf [https://perma.cc/QS33-9DTQ].
Further, for forensic pattern-comparison methods like firearms evidence, the committee noted that opinions “must be limited to those inferences that can reasonably be drawn from a reliable application of the principles and methods.”51Id. at 230. The amended Rule 702 specifically directs judges to (1) more carefully consider that the proponent of an expert bears the burden to show that the various reliability requirements are met and (2) underscore that the opinions that the expert formed are reliably supported by the application of the methods to the data.522022 Comm. on Rules of Prac. and Proc., supra note 18, at 891–93. The rule changes squarely address the issues that judges have grappled with in the area of firearms evidence, perhaps more prominently than in any other area of scientific evidence. The rule changes target the two main concerns that judges have raised: the reliability of the methods and the overstatement of conclusions.

Thus, the body of case law regarding firearms evidence may only grow, and it may be a harbinger for how judges will engage with scientific evidence more broadly after the rule change. In a 2023 ruling, the Supreme Court of Maryland ruled that an expert can only opine on whether spent bullets or cartridges are “consistent or inconsistent” with those known to have been fired by a particular weapon.53Abruquah v. State, 483 Md. 637, 648 (2023). In perhaps a sign of things to come, a trial judge in Cook County, Illinois recently excluded firearms expert testimony entirely, based on scientific concerns with reliability, after conducting an extensive evidentiary hearing. There, the judge concluded that the probative value of the evidence was a “big zero” and raised the concern of “yet another wrongful conviction” based on such evidence if the jurors viewed “[t]he combination of scary weapons, spent bullets, and death pictures without even a minimal connection” to expertise that is repeatable and reproducible.54See People v. Winfield, No. 15-CR-1406601, at 32–34 (Cir. Ct. Cook Cnty. Ill. Feb. 8, 2023).

These developments more fundamentally suggest that for judges and lawyers to carefully engage with the reliability rules set out in Daubert and in Rule 702, it takes engagement by the scientific community. Prominent scientific reports and studies have helped judges and lawyers apply scientific criteria to firearms examinations. The result has limited unsupported use of these firearms comparisons and may promote better methods in the future that can prevent errors and wrongful convictions.55See infra Section II.E. The changes to Rule 702 can cement these developments and ensure more careful review of scientific expert evidence more broadly. We conclude by examining the lessons to be learned from this more-than-a-century-long arc of judicial review of firearms evidence in the United States for future judicial engagement with science.

I.  FIREARMS METHODS AND THE FIRST HALF-CENTURY OF JUDICIAL RULINGS

In this Part, we begin by describing the basic approach used by firearms and toolmark examiners. The approach has been in use for over a hundred years, and its origins trace to a single pioneering examiner, Major Calvin H. Goddard, who powerfully transformed courts’ early skepticism toward firearms comparison evidence to near-universal acceptance.56Calvin Hooker Goddard—Father of Forensic Ballistics, Forensic’s Blog, https://forensicfield.blog/calvin-hooker-goddard-father-of-forensic-ballistics [https://perma.cc/69BV-KYQE] (last visited Sept. 22, 2023). Considered the “father” of modern forensic firearms examination, Goddard assembled databases of information from gun makers and pioneered a “comparison microscope,” a device with side-by-side eyepieces, to make comparing firearms evidence more convenient.57Id. While quite primitive compared with modern technology, Goddard introduced the use of the microscope in firearms comparison, which was seen as permitting a level of sophisticated visual analysis that a layperson lacked access to. We describe how, in the 1930s, Goddard often testified in trials about the comparison microscope, further cementing the method’s legitimacy to courts. Over time, other practitioners and crime laboratories adopted similar methods and began to testify as experts. We describe in this Part what reasoning courts used through the 1930s as they moved from early skepticism to acceptance of this expert testimony.

A.  A Primer on Firearm and Toolmark Identification

Toolmark identification is the practice of human observers opining on whether toolmarks were produced by a particular tool.58Id. A tool is considered any device that serves a mechanical purpose (for example, screwdrivers, pliers, knives, pipe wrenches). As the tool contacts softer material, it sometimes leaves marks on the softer object’s surface. The resulting marks are called “toolmarks.”59One text gives the following example: “For example, when a butter knife is dragged along the surface of butter, one may observe a series of lines across the top of the butter. In this case, the mark in the butter is a toolmark and the knife is the tool that made the mark.” Ronald Nichols, Firearm and Toolmark Identification: The Scientific Reliability of the Forensic Science Discipline 1 (2018). A firearm consists of many tools that perform mechanical functions to fire a bullet. Therefore, firearm identification is considered a subspecialty of toolmark identification.60United States v. McCluskey, No. 10-2734, 2013 U.S. Dist. LEXIS 203723, at *7 (D.N.M. Feb. 7, 2013) (“Firearm identification is a specialized area of toolmark identification dealing with firearms, which involve a specific category of tools.”). The goal of firearm identification is to determine whether two bullets or cartridge cases were fired by the same firearm.

Firearm identification typically involves the examination of features or marks on either bullets or cartridge cases. A piece of unfired ammunition contains four components: (1) a cartridge case, (2) a primer, (3) propellant (gun powder), and (4) a bullet. The cartridge case holds the unit of ammunition together with the bullet in its mouth. When an individual pulls the trigger of a firearm, a firing pin strikes the primer, which is at the head of the cartridge case. Striking the primer creates a spark that ignites the propellant. The ignition of the propellant forces the bullet to detach from the cartridge case and exit the barrel of the firearm. All of these operations have the potential to impart marks on the cartridge case, on the bullet, or on both. For example, manufacturers use firing pins with different shapes, which are often readily apparent on a fired cartridge case. Similarly, the barrel of the gun has grooves machined into it to impart a spiral spin on the bullet (akin to a football spiral)—some manufactures have different numbers and directions of grooves.

Practitioners call these types of features “class characteristics.”61The official definition used by the professional Association of Firearms and Tool Mark Examiners is “[m]easurable features of a specimen which indicate a restricted group source. They result from design factors and are determined prior to manufacture.” Glossary of the Association of Firearm & Tool Mark Examiners 38 (6th ed. 2013). Class characteristics are the result of design features selected by the manufacturer. For example, a manufacturer may choose to use an elliptical-shaped firing pin or a barrel with six right-hand twisting grooves. The ammunition’s size is also a class characteristic. Class characteristics are a useful first step in firearm examination since observing differences in class characteristics can immediately rule out the possibility that two bullets or cartridge cases were fired by the same gun.

Agreement in class characteristics alone, however, is not sufficient to determine that bullets or cartridge cases were fired by the same gun. To draw that inference, examiners must identify and evaluate “individual characteristics,” which are defined by the AFTE as:

Marks produced by the random imperfections or irregularities of tool surfaces. These random imperfections or irregularities are produced incidental to manufacture and/or caused by use, corrosion, or damage. They are unique to that tool to the practical exclusion of all other tools.62Id. at 65.

Examiners rely on training and experience to assess whether striations are uniquely the result of a particular firearm (in other words, individual characteristics), as opposed to incidental striations that occurred during production and may be apparent in many different firearms of the same class.63These incidental striations are often called “subclass characteristics,” or features that may be produced during manufacture that are consistent among items fabricated by the same tool in the same approximate state of wear. These features are not determined prior to manufacture and are more restrictive than class characteristics. Subclass characteristics can easily be confused with individual characteristics. See Gene C. Rivera, Subclass Characteristics in Smith & Wesson SW40VE Sigma Pistols, 39 AFTE J. 247 (2007). Examiners following the AFTE protocol can reach one of several conclusions based on their evaluation of the individual characteristics: identification, elimination, inconclusive, or unsuitable for comparison.

There are no numeric thresholds for how many individual characteristics must be observed before the examiner can declare that two bullets or cartridge cases were fired by the same gun (that is, “an identification”). Rather, the AFTE protocol states that an identification can be reached “when the unique surface contours of two toolmarks are in ‘sufficient agreement.’ ”64AFTE Theory of Identification as it Relates to Toolmarks, The Ass’n of Firearm and Toolmark Examiners, https://afte.org/about-us/what-is-afte/afte-theory-of-identification [https://perma.cc/C498-FRH2]. As defined by the AFTE:

This “sufficient agreement” is related to the significant duplication of random toolmarks as evidenced by the correspondence of a pattern or combination of patterns of surface contours. . . . The statement that “sufficient agreement” exists between two toolmarks means that the agreement of individual characteristics is of a quantity and quality that the likelihood another tool could have made the mark is so remote as to be considered a practical impossibility.65Id.

This criterion of “sufficient agreement” has been roundly criticized by numerous commentators and courts for being “circular.”66See, e.g., PCAST Report, supra note 43, at 60 (“More importantly, the stated method is circular. It declares that an examiner may state that two toolmarks have a ‘common origin’ when their features are in ‘sufficient agreement.’ It then defines ‘sufficient agreement’ as occurring when the examiner considers it a ‘practical impossibility’ that the toolmarks have different origins.”). It is, however, the criterion adopted by the AFTE and widely used by practicing firearm examiners who conduct casework.67Nicholas Scurich, Brandon L. Garrett & Robert M. Thompson, Surveying Practicing Firearm Examiners, 4 For. Sci. Int’l: Synergy 1, 3 (2022).

B.  The Reception of Firearms Experts in U.S. Courts: 1902–1930

While there is increasingly voluminous scholarship regarding the early origins of gun control in the United States, we are not aware of scholarship exploring the early use of experts seeking to link firearms to particular shootings.68Instead, a body of historical work has explored early firearms regulation and related rights. See generally, e.g., Saul Cornell & Nathan DeDino, A Well Regulated Right: The Early American Origins of Gun Control, 73 Fordham L. Rev. 487 (2004); Charles R. McKirdy, Misreading the Past: The Faulty Historical Basis Behind the Supreme Court’s Decision in District of Columbia v. Heller, 45 Cap. U. L. Rev. 107 (2017). In this Section, we detail what we learned from assembling our database of firearms rulings, collected using searches of legal databases and supplemented with unpublished trial court orders where available.69See supra note 28 for a description of the database and a link to it. As we will describe, twenty-nine of the earliest rulings predated Frye v. United States, a 1923 case that formed the basis for the federal standard for judicial review of novel expert evidence: a requirement of “general acceptance” within the relevant scientific community.70Frye v. United States, 293 F. 1013, 1014 (D.C. Cir. 1923). Further, none of the eleven rulings decided from 1923–1930 cited to Frye—we did not see courts relying on the Frye standard until many decades later. Many of these rulings, absent clear rules of evidence concerning expert testimony, instead focused on whether experts could assist or inform the jury.71Today, such a standard is reflected in Federal Rule of Evidence 702(a). See Fed. R. Evid. 702(a) (asking whether “the expert’s scientific, technical, or other specialized knowledge will help the trier of fact to understand the evidence or to determine a fact in issue”). The earliest rulings date back to the 1870s and they were quite mixed on whether it was erroneous or correct to have admitted expert testimony concerning firearms.72The earliest ruling that we located, Moughon v. State, found error to admit the testimony. 57 Ga. 102, 106 (Ga. 1876). So did Brownell v. People, 38 Mich. 732, 738 (Mich. 1878). But see Dean v. Commonwealth, 32 Gratt. 912, 927–28 (Va. 1879) (holding that it was not erroneous to admit firearms comparison testimony); Sullivan v. Commonwealth, 93 Pa. 284, 296–97 (Penn. 1880) (same).

One of earliest reported cases discussing firearms comparison evidence, Commonwealth v. Best,73Commonwealth v. Best, 62 N.E. 748 (Mass. 1902). was written in 1902 by none other than Oliver Wendell Holmes, then the Chief Justice of the Massachusetts Supreme Judicial Court. Best was convicted of murder, and on appeal, argued that certain firearms comparison evidence offered at the trial was erroneous.74Id. at 749–50. The State argued at trial that Best shot a milkman twice with a Winchester rifle found in Best’s kitchen.75Id. at 750. To prove this, the State fired a third bullet through the gun, took a photograph of it, and published photographs of this bullet and the bullets found in the victim’s body as evidence.76Id.

In conjunction with these photographs, the State called an expert witness to “testif[y] that [the bullets] were marked by rust in the same way that they would have been if they had been fired through the rifle at the farm, and that it took at least several months for the rust that he saw in the rifle to form.”77Id. In other words, the bullets found at the crime scene were rusted only because they were fired through the rusty barrel of Best’s rifle.78Id. Best’s counsel argued—at trial and on appeal—that the evidence was inadmissible because “the conditions of the experiment did not correspond accurately with those of the date of the shooting,” that “the force impelling the different bullets were different in kind,” that “the rifle barrel might be supposed to have rusted more in the little more than a fortnight that had intervened, and that it was fired three times on [the murder date], which would have increased the leading of the barrel.”79Id. To wit: environmental factors called the expert’s conclusion into question.

In his quintessentially succinct style, Justice Holmes swiftly disposed of these arguments, concluding that expert testimony was the only way “the jury could have learned so intelligently how that gun barrel would have marked a lead bullet fired through it,” and “the sources of error suggested were trifling.”80Id. Indeed, despite this being one of the first published opinions that we could find on the admissibility of firearms toolmark evidence, Justice Holmes found “no reason to doubt that the testimony was properly admitted.”81Id. Rejecting the other arguments that Best made on appeal, the court upheld the conviction.82Id.

On the West Coast, two years later, the California Supreme Court decided People v. Weber, a 1906 case that also involved crude firearms comparison evidence. Four members of the Weber family had been killed on their property, three from gunshots and one from blunt force trauma.83People v. Weber, 86 P. 671, 673 (Cal. 1906). Police found a .32-caliber revolver in the basement of the Weber barn with dried blood on it, along with five discarded cartridges.84Id. at 673–74. The defendant was tried and convicted of one of the murders, and he appealed.85Id. at 674. During the trial, the State called “an expert in small arms” who testified that he “compared the markings on the bullets taken from the bodies with the markings on the bullets which he had fired from the pistol,” concluding that these bullets were all fired from the alleged murder weapon.86Id. at 678. While the trial court initially admitted this testimony, the next day, the court struck it, concluding “the comparison of the . . . bullets . . . is not a matter of expert testimony, but one within the ordinary capacities of the average juror or citizen.”87Id. (emphasis added). Thus, the testimony was excluded, but the bullets were all admitted into evidence for the jury to compare during deliberations. On appeal, the California Supreme Court did not disturb the trial court’s ruling, but it did reject the defense’s argument that admitting the bullets into evidence was erroneous.88Id. The court instead held that admitting the evidence to help the jury identify the murder weapon “was pertinent and important.”89Id.

In the 1920s, courts gradually moved toward considering firearms examiners as expert witnesses. In State v. Clark,90State v. Clark, 196 P. 360 (Or. 1921). the Oregon Supreme Court considered a criminal appeal of a manslaughter conviction. Charles Taylor, a worker in Oregon’s National Cascade Forest Reserve, was part of a group assigned to bridge maintenance. Each worker brought a .30-30 Winchester rifle, hoping to hunt “camp meat.”91Id. at 362. One night, Clark and Taylor began hunting, and each fired an initial shot so they could use the spent cartridges as communication whistles.92Id. at 362–63. Taylor then left to hunt, but he was never seen alive again. The subsequent search party found a shell near Taylor’s body and an empty shell in the barrel of Taylor’s gun.93Id. at 367. According to the court, both shells “bore on the brass part of the primer a peculiar mark evidently caused by a flaw in the breechblock of the gun from which they had been fired.”94Id. This design flaw “caused a very slight, almost microscopic protuberance in the primer of the shell, which enlarged photographs ma[de] very clear to the naked eye.”95Id. Law enforcement fired several shots from Clark’s gun, and the cartridges produced the same mark.96Id. Additionally, Clark’s gun created a “sort of double scratch” on the inside of the rim of each shell fired, while Taylor’s gun “made only a single scratch.”97Id. Because of this, the court eliminated “the theory that deceased might have been accidentally shot with his own gun.”98Id. The court held these tests had produced “strong evidence that [Clark] was present and fired the shot that killed Taylor.”99Id. This evidence was presented in trial by the sheriff who described the marks but does not appear to have made more specific conclusions.100Id. at 370. Clark’s counsel objected to this testimony and admission of the photographs, but no specific reason for the objection was provided101Id. The only specific objection regarding the shells was that the photographs were impermissibly enlarged, which the court rejected. Id. at 371.—unsurprising because rules surrounding lay and expert witnesses were less formal in this era. The court held that the testimony was proper and the evidence was admissible.102Id. at 370–71.

In a 1922 case, the Alabama Supreme Court explicitly held—unlike in the cases discussed so far—that firearms comparison examiners could testify as expert witnesses.103Pynes v. State, 92 So. 663, 665 (Ala. 1922). Earlier cases had done so, without much discussion. See, e.g., Sullivan v. Commonwealth, 93 Pa. 284, 296–97 (1880). A person was convicted for killing a man and his dog via gunshot.104Pynes, 92 So. at 665. Police had found a revolver near the victim’s body and the revolver had one cartridge in the chamber that had been discharged.105Id. The State called someone “familiar with such things, [who] had used pistols and shells a good deal,” to testify as an expert.106Id. This expert claimed the casing in the empty chamber and the barrel of the revolver demonstrated it “had not been discharged recently.”107Id. The defense unsuccessfully objected, arguing that the person was not an expert.108Id. On appeal, the Alabama Supreme Court upheld the testimony: “A witness may have expert knowledge of some of the more ordinary affairs of life.”109Id. For a case from the next year finding a similar expert “competent” and any error harmless, see Laney v. United States, 294 F. 412, 416 (D.C. Cir. 1923).

In a 1923 case, however, the Illinois Supreme Court powerfully objected to expert evidence on firearms comparison.110People v. Berkman, 139 N.E. 91, 94–95 (Ill. 1923). The court reversed the conviction on appeal for multiple reasons,111Id. at 94. but it particularly took issue with the State’s use of a police officer as an expert. At trial, a police officer testified for the State that a gun in evidence was the one fired at the victim because it “was the identical revolver from which the bullet introduced in evidence was fired on the night [the victim] was shot.”112Id. The officer was “asked to examine the Colt automatic .32 aforesaid, and gave it as his opinion that the bullet introduced in evidence was fired from the Colt automatic revolver in evidence.”113Id. The Court also questioned the qualifications of the officer:

The state sought to qualify [the officer] for such remarkable evidence by having him testify that he had had charge of the inspection of firearms for the last 5 years of their department; that he was a small-arms inspector in the National Guard for a period of 9 years; and that he was a sergeant in the service in the field artillery, where the pistol is the only weapon the men have, outside of the large guns or cannon.

Id.
The court emphasized:

He even stated positively that he knew that that bullet came out of the barrel of that revolver, because the rifling marks on the bullet fitted into the rifling of the revolver in question, and that the markings on that particular bullet were peculiar, because they came clear up on the steel of the bullet.114Id. (emphasis added).

The court elaborated:

The evidence of this officer is clearly absurd, besides not being based upon any known rule that would make it admissible. If the real facts were brought out, it would undoubtedly show that all Colt revolvers of the same model and of the same caliber are rifled precisely in the same manner, and the statement that one can know that a certain bullet was fired out of a 32-caliber revolver, when there are hundreds and perhaps thousands of others rifled in precisely the same manner and of precisely the same character, is preposterous.115Id.

Finally, the court focused on lay versus expert opinions:

Mere opportunity does not change an ordinary observer into an expert, and special skill does not entitle a witness to give an opinion, when the subject is one where the opinion of an ordinary observer is admissible, or where the jury are capable of forming their own conclusions from the pertinent facts susceptible of proof in common form. . . . If any facts pertaining to the gun and its rifling existed by which such fact could be known, it would have been proper for the witness to have stated such facts and let the jury draw their own conclusions.116Id. at 95 (emphasis added).

The court thus strongly rejected admitting an expert to opine on such firearms evidence.117Id.

By the late 1920s, however, judicial rulings began to shift as the work of Major Goddard became more known. Goddard founded a private crime laboratory—“The Bureau of Forensic Ballistics”118For a detailed account, see Heather Wolffram, Teaching Forensic Science to the American Police and Public: The Scientific Crime Detection Laboratory, 1929-1938, 11 Acad. Forensic Path 52, 55 (2021).—and published the American Journal of Police Science. Goddard became particularly well-known for assisting with the investigation in the Sacco and Vanzetti case in Massachusetts and in the St. Valentine’s Day Massacre in Chicago in 1929.119Id. Before Goddard published his seminal article on ballistic evidence for the U.S. Army in 1925, Forensic Ballistics, many judges, as described above, viewed firearms comparison as a crude technique that jurors could conduct themselves by visually examining the evidence.120Id.

This began to change. For example, in a 1928 Kentucky case, Jack v. Commonwealth, the state supreme court discussed firearms comparison testimony and found the evidence “important if competent, but highly prejudicial  if incompetent.”121Jack v. Commonwealth, 1 S.W.2d 961, 963 (Ky. 1928). The court discussed an article by Major Goddard in Popular Science Monthly122Citing Goddard’s article, the court stated that “the subject of ballistics . . . has reached the status of an exact science.” Id. at 963. and summarized the process:

[T]here is in use a special microscope consisting of two barrels so arranged that both are brought together in one eyepiece. The fatal bullet is placed under one of these barrels, and a test bullet that has been fired through defendant’s pistol is placed under the other barrel, and this brings the sides of the two bullets together and causes them to fuse into one object. If the grooves and other distinguishing marks on both bullets correspond, it is said to show that both balls were fired from the same pistol.123Id. at 963–64.

The court concluded:

It thus appears that this is a technical subject, and in order to give an expert opinion thereon a witness should have made a special study of the subject and have suitable instruments and equipment to make proper test . . . . Clearly the witnesses in this case were not qualified to give such opinions and conclusions and the admission of such evidence was erroneous and prejudicial.124Id. at 964 (emphasis added).

The court therefore rejected the testimony not because it doubted the method itself but because the proffered experts did not follow proper practices.

One year after Jack, the Kentucky Supreme Court again examined firearms comparison testimony in Evans v. Commonwealth.125Evans v. Commonwealth, 19 S.W.2d 1091 (Ky. 1929). The defendant, Evans, was indicted for murder of the Pineville, Kentucky chief of police, and he was ultimately convicted of manslaughter.126Id. at 1092. Six shots were fired in the murder, and police had dug up a bullet from the ground near the scene.127Id. Evans’s primary argument on appeal was that firearms comparison evidence was improper, so the court addressed it “with some degree of elaboration.”128Id. at 1093. The court referenced Jack and noted that one month after Jack was published, Major Goddard—who wrote the article referenced by the court in Jack—offered to testify.129Id. Goddard was given the defendant’s automatic .45 pistol, seven cartridges taken from this pistol, six cartridges found at the scene of the crime, and the bullet that police had taken from the dirt.130Id. at 1094. Goddard concluded “that he was convinced that the bullet that had been introduced into evidence had been fired through [Evans’s] pistol.”131Id. (emphasis added). To justify this conclusion, Goddard gave a detailed account of how he compared the different bullets by putting “the two bullets under the two microscopes together, [so that] in the center . . . you see a single bullet. . . . [I]f these bullets were fired through the same pistol they will match . . . .”132Id. at 1095. Goddard testified that he “only required one single test to identify the bullet in evidence as having been fired through the Evans pistol.”133Id. (emphasis added).

During Goddard’s cross-examination, the jury was allowed to examine the evidence using the microscope.134Id. at 1096. The defense objected that Goddard’s conclusion was one of fact that the jury should instead determine.135Id. at 1097. The court rejected this argument.136Id. Interestingly, the court concluded that Goddard’s opinion was an ordinary lay opinion, not that of an expert.137Id. The court compared Goddard’s testimony to that of a lay witness, saying that “he could smell gasoline,” even though “the average man would have great difficulty in telling just how coal oil or gasoline smells, though acquainted with their odors.” Id. Cross-examination was thus a sufficient safeguard, and “rigid adherence” to the rules of evidence “would be subversive of the ends for which they were adopted.”138Id. The defense also objected to the jury looking through the microscopes which the court quickly dismissed as without “well-founded reason.”139Id.

These two Kentucky Supreme Court opinions formed the framework for the modern approach to firearms comparison evidence. Jack demonstrates that courts would not always let a specific person testify as a qualified expert on firearms comparison. But Evans shows that the courts were not concerned about the underlying validity of the methodology of firearms comparisons. If the State could produce a witness in the mold of Major Goddard, following the now-respected comparison microscope methodology, then the testimony would routinely be admitted.

C.  A National Body of Firearms Rulings: 1930s to 1960s

Beginning in the 1930s, judges began to further develop case law in other parts of the country, with new experts testifying. We identified forty rulings from 1931–1970, each set out in our database. During this time period, rulings spread nationally, as judges appear powerfully influenced by Evans,140Evans v. Commonwealth, 19 S.W.2d 1091 (Ky. 1929). which became one of the lodestar cases for adoption of firearms comparison evidence. Use of toolmark evidence for firearms comparison began to be called “accepted” and “well-recognized” as a methodology. As time went on, judges simply cited to Evans and other prototypical early cases to admit expert testimony, and discussion of the merits of firearms comparison methods diminished. Further, defendants increasingly did not challenge the evidence but rather focused on the preservation of evidence or the qualifications of the testifying experts. These challenges were almost always unsuccessful.

In 1937, for example, the Florida Supreme Court briefly concluded that a firearms comparison expert was “fully qualified to testify as an expert . . . and to draw a reliable conclusion as to whether or not the bullet found in the body of the deceased was fired from the pistol introduced in evidence.”141Riner v. State, 176 So. 38, 39–40 (Fla. 1937). In another Missouri case, the expert himself explained that he “was not a ballistic expert,” but he still argued he had “much experience in the work of identifying firearms.”142State v. Couch, 111 S.W.2d 147, 149 (Mo. 1937). Despite this concession, the court concluded that “he was an expert in the identification of firearms and bullets by the comparison method by means of a microscope.”143Id. In 1938, an Oklahoma appellate court further explained:

There were few decisions with reference to the introduction of expert testimony to identify the weapon from which a shot was fired until recent years, but the science of ballistics is now recognized as one of the best methods in ferreting out crime that could not otherwise be detected. Expert evidence to identify the weapon from which a shot was fired is generally admitted under the rules covering other forms of expert testimony, and it is the modern tendency of the courts to allow the introduction of such testimony, where the witness’ preparation as shown by experience and training qualifies him to give expert opinion on firearms and ballistics tests.144Macklin v. State, 76 P.2d 1091, 1095 (Okla. Crim. App. 1938) (emphasis added).

By 1940, experts could cite fifteen years of experience in “the firing of different caliber pistols,” which was enough to qualify a person as a firearms comparison expert.145McGuire v. State, 194 So. 815, 816 (Ala. 1940). In a 1941 case in Virginia, an expert from the FBI testified that he had twenty years of experience, “six of which had been devoted to the examination of firearms.”146Ferrell v. Commonwealth, 14 S.E.2d 293, 295 (Va. 1941). The expert testified that the cartridge he examined was fired by the defendant’s shotgun.147Id. at 296. The reviewing court cited to Evans,148Id. at 297. as courts continued to do. For example, in State v. McKeever,149State v. McKeever, 101 S.W.2d 22 (Mo. 1936). the expert testified this was his 191st trial—the court allowed the evidence to be admitted without discussion, simply citing to Evans.150Id. at 29. Increasingly brief opinions found “no error” in introduction of such testimony.151See, e.g., Pilley v. State, 25 So.2d 57, 60 (Ala. 1946) (“In the introduction of this evidence there was no error.”); Kyzer v. State, 33 So.2d 885, 887 (Ala. 1947) (finding no error without explanation). In Collins v. State, 33 So.2d 18, 20 (Ala. 1947), the court overruled objections to the expert testimony, stating: “We have had occasion several times to consider questions of this sort, and the principles of law applicable to the same have been repeated frequently, so that it will not be necessary to do so again . . . .” Yet, in none of those prior opinions did the court actually repeat or state its reasoning.

There were some outliers. For example, a 1948 New Mexico Supreme Court ruling reversed the admissibility of “ballistic expert” testimony, which allegedly matched a specific gun to the bullet that killed the victim.152State v. Martinez, 198 P.2d 256, 257–61 (N.M. 1948). After being qualified, the expert testified about his methodology, calling the firearm’s marks “absolutely identical.”153Id. at 257–58. The court was concerned that the expert had concluded with statements such as: “I will state positively that the evidence bullet (death bullet) was fired out of State’s Exhibit No. 2, this [defendant’s] gun.”154Id. at 260 (emphasis added). The court emphasized that while firearms comparison is “almost, if not an exact science,” and “judicial notice may be taken” of the method, ballistic experts still must, “like . . . experts generally,” only provide “opinion testimony.”155Id. While “[i]t may be true that such witnesses as Colonel Goddard, who testified in Evans v. Commonwealth and other reported cases, are so skilled in the science of forensic ballistics that the chance of error is negligible,” they are the exception.156See id. at 261 (citation omitted). Yet, “[t]he belief of a witness that his skill is so transcendent that an error in judgment is impossible, may itself be false or a mistake, assuming that the science is exact.”157Id.

In a rare 1951 Georgia Supreme Court case, Henderson v. State, the court excluded firearms comparison testimony due to concerns with the specific expert. The defense attorney asked the expert “why he did not measure the distance and depth of the grooves, and the witness explained by giving the reply that the microscope was the highest and best evidence.”158Henderson v. State, 65 S.E.2d 175, 177 (Ga. 1951). The court held that the answer was not a “response to the question propounded,”159Id. that the right to a “thorough and sifting cross-examination” was violated, and that the judgement should be reversed for a new trial.160Id.

In a Maryland case, the defendant also attacked the State’s firearms comparison testimony.161Edwards v. State, 81 A.2d 631, 635 (Md. 1951). The court emphatically rejected this position:

For many years ballistics has been a science of great value in ferreting out crimes that otherwise might not be solved. When a pistol is fired, a pressure is developed within the shell which drives the bullet out of the barrel, and the shell is driven back against the breech of the pistol with similar force. The markings on the hard breech of the pistol are thereby stamped on the soft butt of the shell. Testimony to identify the weapon from which a shot was fired is admissible where it is shown that the witness offering such testimony is qualified by training and experience to give expert opinion on firearms and ammunition.162Id.

The court cited back to Best and Evans to justify this result, despite the faint marks and acknowledgment that the marks could have been explained by a different type of weapon.163See id. at 635–36 (noting that “it was admittedly possible that the bullets could have been fired from a Luger” rather than the defendant’s gun).

In a 1964 Florida case, the court provided the following explanation about the recognition of firearms comparison testimony:

It is now well established that a witness, who qualifies as an expert in the science of ballistics, may identify a gun from which a particular bullet was fired by comparing the markings on that bullet with those on a test bullet fired by the witness through the suspect gun. An expert will be permitted to submit his opinion based on such an experiment conducted by him. The details of the experiment should be described to the jury.164Roberts v. State, 164 So. 2d 817, 820 (Fla. 1964).

Finally, a 1969 Illinois appellate case offers some of the earliest descriptions of class and individual characteristics, the predominant terminology in modern firearms comparison testimony:

When a weapon is received at the laboratory it is classified as to type, caliber, make and model. Each gun has class characteristics common to its particular make and model. In addition, each gun has its own individual characteristics. . . . After the gun is received at the laboratory, if operable, it is fired into a bullet recovery box. The bullet in question is then compared with the test bullet under a comparison microscope.165People v. O’Neal, 254 N.E.2d 559, 561–62 (Ill. App. Ct. 1969) (emphasis added).

During this time, courts routinely rejected challenges to firearms experts’ qualifications.166See, e.g., United States v. Hagelberger, 9 C.M.R. 226, 233–34 (1952). And expert qualifications only increased: by now, some experts testified that they had worked on “approximately three to four thousand cases of ballistics.”167Gipson v. State, 78 So. 2d 293, 297 (Ala. 1955). Judicial review of forensic evidence in the following decades involved significant deference, with trial courts deferring to the expert witnesses, and then the appellate courts deferring to the trial courts. Often, courts focused on the specific examiner’s experience rather than assessing the field’s foundational validity.168This, however, is not universal. For a more recent ruling, see State v. Raynor, 254 A.3d 874, 887–88 (Conn. 2020) (noting that refusing to consider new information as a scientific field evolves “would transform the trial court’s gatekeeping function . . . into one of routine mandatory admission of such evidence, regardless of advances in a particular field and its continued reliability”).

D.  Pre-Daubert Cases

In the 1970s and 1980s, leading up to the Daubert ruling in 1993, courts routinely admitted firearms expert testimony, often without discussion.169See, e.g., Hampton v. People, 465 P.2d 394, 400 (Colo. 1970) (stating there was no abuse of discretion for admitting a firearm comparison expert’s testimony). For perhaps the first case referring to the discipline as a type of toolmark comparison, see United States v. Bowers, 534 F.2d 186, 193 (9th Cir. 1976). We located only twenty-four such rulings, perhaps because unpublished rulings became far more common given the broader acceptance of such expert testimony. While challenges to expert qualifications typically failed—with courts citing to the experience of the examiner—courts generally expected examiners to also possess specialized training and credentials.170See, e.g., State v. Hunt, 193 N.W.2d 858, 867 (Wis. 1972) (stating “the witness had great experience in the field of ballistics”); Acoff v. State, 278 So. 2d 210, 217 (Ala. 1973) (concluding expert testimony of witness with “more than six years” of firearms comparison training was “properly allowed”); People v. McKinnie, 310 N.E.2d 507, 510 (Ill. App. Ct. 1974) (finding examiners’ “considerable practical experience” was sufficient, despite lack of “scientific” training). But see State v. Seebold, 531 P.2d 1130, 1132 (Ariz. 1975) (affirming exclusion of proffered experts at trial in which one admitted “he was not a scientist or a criminalist” and the second was a gunsmith and gun shop owner who “had no formal education in the field of ballistics and had never testified before in this field”); Cooper v. State, 340 So. 2d 91, 93 (Ala. Crim. App. 1976) (“The State, in attempting to establish Charles Wesley Smith as an expert in ballistics, elicited some general information on his background, but failed to establish many specific facts to support his expertise in the field of ballistics.”); Bowden v. State, 610 So. 2d 1256, 1258 (Ala. Crim. App. 1992) (affirming trial court’s exclusion of firearms expert’s testimony because it was not a “clear abuse of . . . discretion”).

Some courts excluded firearms testimony based on other issues.171See, e.g., Johnson v. State, 249 So. 2d 470, 472 (Fla. Dist. Ct. App. 1971) (reversing admission of firearms testimony because the State could not produce the bullet taken from the deceased for examination). In a federal case, the defendant was denied access to an expert to examine the evidence and testimony, which was found particularly problematic given the quality of the evidence itself, as “seventy-five percent of this slug was destroyed and the identification was made on the remaining 25%.”172Barnard v. Henderson, 514 F.2d 744, 746 (5th Cir. 1975). Other cases relied on the Confrontation Clause, including one in which a police officer testified about a report by an examiner who was not present at trial.173Stewart v. Cowan, 528 F.2d 79, 82–83 (6th Cir. 1976). Still other courts considered whether experts sufficiently described their work.174People v. Miller, 334 N.E.2d 421, 429 (Ill. App. Ct. 1975). Other cases found it sufficient to admit testimony finding similar class characteristics, even when there was not enough information to compare any individual characteristics. See, e.g., State v. Bayless, 357 N.E.2d 1035, 1058–59 (Ohio 1976).

In general, experts continued to reach highly aggressive conclusions that were permitted by courts. For example, the expert in a 1981 Wyoming case resolved, “The markings on the bullets from the home of appellant’s brother matched the markings found on the bullet removed from [the defendant], establishing that they had been fired from the same gun.”175McDaniel v. State, 632 P.2d 534, 535 (Wyo. 1981). In a leading Virginia case, an expert testified he was “certain” one of the bullets removed from the victim’s body was fired from the defendant’s pistol, and there was “no margin of error.”176Watkins v. Commonwealth, 331 S.E.2d 422, 434 (Va. 1985). The defendant argued on appeal that this “no margin of error” statement was impermissible.177Id. The court rejected this argument, simply concluding that the statement went toward the weight of the testimony, not its admissibility.178Id.

Pre-Daubert, some defendants did contest whether firearms experts relied on sufficient facts and data. In an exemplar Utah case, the expert testified at a preliminary hearing that a bullet fired from the alleged murder weapon matched a bullet taken from the victim’s body.179State v. Schreuder, 712 P.2d 264, 268 (Utah 1985). But while he gave this conclusion, he was not able to give “an exact description of the striations, nor did he have photographs of them available with him in court.”180Id. The court rejected arguments that the expert did not have sufficient foundation for his conclusion, holding that the testimony was within the expert’s specialized knowledge.181Id. at 268–69.

II.  MODERN SCIENTIFIC ASSESSMENTS AND GROWING JUDICIAL SKEPTICISM OF FIREARMS EVIDENCE

Following the U.S. Supreme Court’s ruling in 1993 in Daubert v. Merrell Dow Pharmaceuticals, Inc., federal courts began to more carefully scrutinize firearms evidence, although exclusion remained rare.182See, e.g., Melcher v. Holland, No. 12-0544, 2014 U.S. Dist. LEXIS 591, at *42–44, 51 (N.D. Cal. Jan. 3, 2014) (finding no ineffective assistance of counsel highlighting it was correct to admit firearms evidence); United States v. Sebbern, No. 10 Cr. 87, 2012 U.S. Dist. LEXIS 170576, at *21–24 (E.D.N.Y. Nov. 29, 2012) (finding hearing unnecessary when other courts had examined reliability of firearms evidence). Daubert led to the revision of Federal Rule of Evidence 702 in 2000 that established new standards to assess the reliability of scientific expert testimony. Many of the defendants’ objections shifted from concerns about the experts’ qualifications to concerns about the reliability of the methodology and conclusions,183See, e.g., Abruquah v. State, No. 2176, 2020 Md. App. LEXIS 53, at *19–25 (Md. Ct. Spec. App. Jan. 17, 2020) (defense objections regarding methodology and expert conclusion language); United States v. Mouzone, 687 F.3d 207, 215–17 (4th Cir. 2012) (defense objections focused on expert allegedly violating limits imposed by judge on conclusion language). and about the use of inadmissible hearsay evidence as a basis for the experts’ conclusions.184See, e.g., United States v. Corey, 207 F.3d 84, 87–92 (1st Cir. 2000); Green v. Warren, No. 12-6148, 2013 U.S. Dist. LEXIS 179765, at *21–22 (D.N.J. Dec. 20, 2013). At the state level, there was not any immediate difference in how courts approached firearms expert testimony post-Daubert; methodology and expert qualifications were more explicitly mentioned, but the overall analysis largely remained the same.185See, e.g., State v. Gainey, 558 S.E.2d 463, 473–74 (N.C. 2002).

In the late 1990s and early 2000s, courts began rejecting expert firearms comparison testimony as unreliable, largely by relying on Daubert. In our database, we include just seven cases from 1993–2000. However, the number of rulings begins to dramatically increase after 2000, with 188 rulings from 2000 to 2022. We turn next to that rich body of modern case law.

Figure 1.  Reported U.S. Firearms Rulings by Decade

 

Figure 1 illustrates this remarkable trend—one can see a fairly steady number of twenty or fewer reported judicial rulings regarding firearms comparison evidence through the 1990s. Yet, beginning in the early 2000s, these rulings began to dramatically increase in number.

The Supreme Court in Daubert revolutionized judicial review of scientific evidence by setting out five factors for courts to consider in evaluating expert testimony: whether the theory or technique relied on (1) can be (and has been) tested, (2) has been subjected to peer review and publication, (3) has a known or potential rate of error, (4) includes the existence and maintenance of standards controlling its operation, and (5) is generally accepted within the relevant scientific community.186Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579, 593–94 (1993). We provide an overview of each factor and how courts generally have reviewed them in the context of firearms comparison testimony.

First, courts generally have not questioned the “testability” of firearms forensics, a “key question” when examining reliability.187Id. at 593. A series of courts have held that the propositions that “firearms leave discernible toolmarks on bullets and cartridge casings fired from them, and that trained examiners can conduct comparisons to determine whether a particular gun has fired particular ammunition . . . can be, and have been, tested.”188United States v. Tibbs, No. 2016 CF1 19431, 2019 D.C. Super. LEXIS 9, at *25 (D.C. Super. Ct. Sept. 5, 2019); see also United States v. Monteiro, 407 F. Supp. 2d 351, 369 (D. Mass. 2006) (“[T]he existence of the requirements of peer review and documentation ensure sufficient testability and reproducibility to ensure that the results of the technique are reliable.”); United States v. Otero, 849 F. Supp. 2d 425, 433 (D.N.J. 2012) (“Though [it] inherently involves the subjectivity of the examiner’s judgment as to matching toolmarks, the AFTE theory is testable on the basis of achieving consistent and accurate results.”); United States v. Romero-Lobato, 379 F. Supp. 3d 1111, 1118 (D. Nev. 2019) (“There is little doubt that the AFTE method of identifying firearms satisfies [the testing requirement].”); United States v. Ashburn, 88 F. Supp. 3d 239, 245 (E.D.N.Y. 2015) (“The AFTE methodology has been repeatedly tested.”).

Second, many courts have determined the AFTE method of toolmark identification has been subject to sufficient peer review and publication, largely through the AFTE Journal.189See, e.g., Ashburn, 88 F. Supp. 3d at 245–46 (finding AFTE method has been subjected to peer review through the AFTE Journal); Otero, 849 F. Supp. 2d at 433 (describing the Journal’s peer reviewing process and finding the methodology subject to peer review); United States v. Taylor, 663 F. Supp. 2d 1170, 1176 (D.N.M. 2009) (finding AFTE method subjected to peer review through AFTE Journal and two articles submitted by the government in peer-reviewed journal about the methodology); Monteiro, 407 F. Supp. 2d at 366–67 (describing AFTE Journal’s peer reviewing process and finding it meets peer review element). However, courts are beginning to more rigorously inspect the validity of the peer review process at that journal. Prior to January 2020, the AFTE Journal used a highly unusual “open-review” process whereby the identities of the authors and the reviewers were disclosed and direct communication was encouraged. Furthermore, all of the reviewers were members of AFTE who “ha[d] a vested, career-based interest in publishing studies that validate their own field and methodologies.”190Tibbs, 2019 D.C. Super. LEXIS 9, at *33. These factors led a D.C. Superior Court judge to conclude in 2019: “[T]he vast majority of [firearms comparison] studies are published in a journal that uses a flawed and suspect review process, [which] greatly reduces its value as a scientific publication.”191Id. at *35. Therefore, the peer review factor “on its own does not, despite the sheer number of studies conducted and published, work strongly in favor of admission of firearms and toolmark identification testimony.”192Id. at *36. Nevertheless, courts have cited to other studies or reports to validate the soundness of toolmark comparison—one federal court curiously cited to the 2009 NAS and 2016 PCAST reports as evidence of peer review, despite how damning those reviews are of the method.193See Romero-Lobato, 379 F. Supp. 3d at 1119 (D. Nev. 2019) (“[O]f course, the NAS and PCAST Reports themselves constitute peer review despite the unfavorable view the two reports have of the AFTE method. The peer review and publication factor therefore weighs in favor of admissibility.”). But see United States v. Tibbs, No. 2016 CF1 19431, 2019 D.C. Super. LEXIS 9, at *29 (D.C. Super. Ct. Sept. 5, 2019) (“If negative post-publication commentary from an external reviewing body can satisfy this prong of the Daubert analysis, then the peer reviewed publication component would be more or less read out of Daubert, leaving behind only the requirement of some type of publication.”).

Third, courts have tended to view the error rate for forensics firearms testing as low, though they also sometimes acknowledge that the error rate is “presently unknown.”194United States v. Johnson, No. (S5) 16 Cr. 281 (PGG), 2019 U.S. Dist. LEXIS 39590, at *55 (S.D.N.Y. Mar. 11, 2019) (citing Ashburn, 88 F. Supp. 3d at 246; United States v. Diaz, No. CR 05-00167 WHA, 2007 U.S. Dist. LEXIS 13152, at *27 (N.D. Cal. Feb. 12, 2007)). One federal court concluded that “it is not possible” to calculate an absolute error rate for firearms analysis because “the process is so subjective and qualitative.”195United States v. Monteiro, 407 F. Supp. 2d 351, 367 (D. Mass. 2006). This third factor is particularly important for rigorous assessment because “an expert witness’s ability to explain the methodology’s error rate—in other words, to describe the limitations of her conclusion—is essential to the jury’s ability to appropriately weigh the probative value of such testimony.”196Tibbs, 2019 D.C. Super. LEXIS 9 at *37. Faced with numerous studies purporting extremely low error rates, many courts have simply accepted the validity of these conclusions that forensics firearms testing does have a nominal error rate197See Ashburn, 88 F. Supp. 3d at 246 (“[T]he error rate, to the extent it can be measured, appears to be low, weighing in favor of admission.”); United States v. Otero, 849 F. Supp. 2d 425, 433–34 (D.N.J. 2012) (summarizing several studies indicating a low error rate); United States v. Taylor, 663 F. Supp. 2d 1170, 1177 (D.N.M. 2009) (“[T]his number [less than 1%] suggests that the error rate is quite low.”); Monteiro, 407 F. Supp. 2d at 367–68 (summarizing relevant studies and finding that the known error rate is not “unacceptably high”). or has “a false positive rate of 1.52%.”198Romero-Lobato, 379 F. Supp. 3d at 1120.

In more recent years, as we discuss in a later Section in more detail, courts have begun to reexamine the validity of the error studies and rates presented.199See infra Section II.D; State v. Terrell, No. CR170179563, 2019 Conn. Super. LEXIS 827, at *3 (Conn. Super. Ct. Mar. 21, 2019) (“[The toolmark field] is also not static. A methodology may at one time be viewed as reliable by the scientific community and later fall out of favor.”). Citing basic design flaws of most studies in the field and the studies’ failure to address a large number of “inconclusive” results, one court, for example, found “it difficult to conclude that the existing studies provide a sufficient basis to accept the low error rates for the discipline that these studies purport to establish.”200Tibbs, 2019 D.C. Super. LEXIS 9 at *40–41. Other courts noted concerns with the lack of rigorous testing but did not find this sufficiently persuasive to exclude the evidence outright.201Romero-Lobato, 379 F. Supp. 3d at 1120 (“While the Court is cognizant of the PCAST Report’s repeated criticisms regarding the lack of true black box tests, the Court declines to adopt such a strict requirement for which studies are proper and which are not. Daubert does not mandate such a prerequisite for a technique to satisfy its error rate element.”).

Fourth, many judges have focused on how the AFTE methodology lacks clearly defined, objective standards. Judges have variously described the AFTE method as “inherently vague,”202United States v. Glynn, 578 F. Supp. 2d 567, 572 (S.D.N.Y. 2002). “more of a description of the process of firearm identification rather than a strictly followed charter for the field,”203United States v. Monteiro, 407 F. Supp. 2d 351, 371 (D. Mass. 2006). and “merely unconstrained subjectivity masquerading as objectivity.”204Tibbs, 2019 D.C. Super. LEXIS 9 at *69. And as many courts have pointed out, “the AFTE standard is circular—an identification can be made upon sufficient agreement, and agreement is sufficient when an identification can be made.”205People v. Ross, 129 N.Y.S. 3d 629, 634 (N.Y. Sup. Ct. 2020); see also United States v. Taylor, 663 F. Supp. 2d 1170, 1177 (D.N.M. 2009) (“[T]he AFTE theory is circular.”); Monteiro, 407 F. Supp.2d at 370 (“[T]he AFTE Theory . . . is tautological.”); United States v. Green, 405 F. Supp. 2d 104, 114 (D. Mass. 2005) (stating the method is “either tautological or wholly subjective”). The inherent subjectivity has weighed against admissibility of firearms comparison evidence for many courts.206See, e.g., Romero-Lobato, 379 F. Supp. 3d at 1121 (“With the AFTE method, matching two tool marks essentially comes down to the examiner’s subjective judgment based on his training, experience, and knowledge of firearms. This factor weighs against admissibility.”); United States v. Ashburn, 88 F. Supp. 3d 239, 246–47 (E.D.N.Y. 2015) (discussing subjectivity); Ross, 129 N.Y.S.3d at 633 (describing testimony that “there is no across-the-board standard as to what is ‘sufficient agreement’ in his field”); United States v. Sebbern, No. 10 Cr. 87(SLT), 2012 U.S. Dist. LEXIS 170576, at *11 (E.D.N.Y. Nov. 30, 2012) (“[T]he standards employed by examiners invite subjectivity.”). Courts, however, have often also noted that they find such subjectivity “not fatal” to admissibility.207See Ashburn, 88 F. Supp. 3d at 246–47 (“[T]he subjectivity of a methodology is not fatal under Rule 702 and Daubert.”); Cohen v. Trump, 2016 U.S. Dist. LEXIS 117059, at *35 (S.D. Cal. Aug. 29, 2016) (“[S]ubjective opinions based on an expert’s experience in the industry [are] proper”); Romero-Lobato, 379 F. Supp. 3d at 1120 (“Federal Rule of Evidence 702 inherently allows for an expert with sufficient knowledge, experience, or training to testify about a particular subject matter.”). Thus, courts often note that subjectivity alone does not make a method unreliable and they are focused on evaluating reliability.208See, e.g., Romero-Lobato, 379 F. Supp. 3d at 1120 (“The mere fact that an expert’s opinion is derived from subjective methodology does not render it unreliable.”); United States v. Otero, 849 F. Supp. 2d 425, 431 (D.N.J. 2012) (“[E]xpert testimony on matters of a technical nature or related to specialized knowledge, albeit not scientific, can be admissible under Rule 702, so long as the testimony satisfies the Court’s test of reliability and the requirement of relevance.”).

Finally, the last Daubert factor hinges on general acceptance within the relevant scientific community. Who constitutes the “relevant” scientific community has never been defined with precision, yet it is often determinative. Because the AFTE method is accepted within the organization’s own community of firearms examiners, courts frequently find the requisite general acceptance.209See, e.g., United States v. Shipp, 422 F. Supp. 3d 762, 782 (E.D.N.Y. 2019) (“Most courts have, in cursory fashion, identified toolmark examiners as the relevant community, and have summarily determined that the AFTE Theory is generally accepted in that community.”). But other judges have pointed out that this narrow definition is comprised exclusively of individuals “whose professional standing and financial livelihoods depend on the challenged discipline.”210United States v. Tibbs, No. 2016 CF1 19431, 2019 D.C. Super. LEXIS 9, at *73 (D.C. Super. Ct. Sept. 5, 2019); see also Shipp, 422 F. Supp. 3d at 783 (“The AFTE Theory has not achieved general acceptance in the relevant community.”). One court notes, “It is self evident that practitioners accept the validity of the method as they are the ones using it. Were the relevant scientific community limited to practitioners, every scientific methodology would be deemed to have gained general acceptance.” State v. Terrell, No. CR170179563, 2019 Conn. Super. LEXIS 827, at *14 (Conn. Super. Ct. Mar. 21, 2019). In other forensics fields, acceptance among only practitioners has been deemed unreliable and has led to the exclusion of the evidence under Daubert. See, e.g., United States v. Saelee, 162 F. Supp. 2d 1097, 1104 (D. Alaska 2001) (“[G]eneral acceptance of the theories and techniques involved in the field . . . among the closed universe . . . proves nothing.”). Thus, perhaps the relevant scientific community should be broadened to include nonpractitioner research scientists.

While acknowledging the discipline’s weaknesses, most federal courts have balanced the Daubert factors and found testimony admissible. As one federal court put it: “[T]his lack of objective criteria is countered by the method’s relatively low rate of error, widespread acceptance in the scientific community, testability, and frequent publication in scientific journals.”211Romero-Lobato, 379 F. Supp. 3d at 1122; see also Ricks v. Pauch, No. 17-12784, 2020 U.S. Dist. LEXIS 50109 (E.D. Mich. Mar. 23, 2020) (“Given that no court has ever found Firearm and Toolmark Identification evidence to be inadmissible under Daubert, it is clear that firearm identification testimony meets the Daubert reliability standards and can be admitted as evidence.” (quoting United States v. Alls, No. CR2-08-223 (S.D. Ohio Dec. 7, 2009))); United States v. Wrensford, No. 2013-0003, 2014 U.S. Dist. LEXIS 102446, at *57 (D.V.I. July 28, 2014) (finding “consistent with other courts—that the concerns with subjectivity as it may impact testability, standards, and protocols do not tip the scales against admissibility”). Further, as noted, Rule 702 was revised in 2000 to incorporate Daubert, but it specified additional factors, including asking courts to examine the application of a method to the facts in a case.212Fed. R. Evid. 702. Courts vary in whether they simply consider Daubert factors alone,213See, e.g., United States v. Chavez, No. 15-CR-00285-LHK-1, 2021 U.S. Dist. LEXIS 237830, at *17 (N.D. Cal. Dec. 13, 2021) (finding that four of five Daubert factors weighed in favor of admissibility). or whether they also discuss Rule 702—as will be discussed next, litigants have increasingly focused on the as-applied language in Rule 702, critiquing how the method was used, as well as on the language an expert used to express conclusions.

A.  Post-Daubert Cases

As a federal district court noted in 2005, for over a decade after the Daubert ruling, “every single court post-Daubert has admitted [firearms identification] testimony, sometimes without any searching review, much less a hearing.”214United States v. Green, 405 F. Supp. 2d 104, 108 (D. Mass. 2005) (emphasis omitted). When courts did examine firearms evidence, early post-Daubert challenges often focused on if the expert’s qualifications were sufficient under Rule 702,215For a case affirming disqualification of defense, not prosecution expert, see State v. Hurst, 828 So. 2d 1165 (La. Ct. App. 2002). even if they did begin to discuss questions regarding reliability of methods and principles used.216See, e.g., State v. Samonte, 928 P.2d 1, 26–27 (Haw. 1996) (discussing the defendant’s argument that prosecution’s firearms expert was not qualified); Whatley v. State, 509 S.E.2d 45, 50 (Ga. 1998) (rejecting the defendant’s argument that evidence used was “inherently unreliable” and noting the “ballistics evidence introduced in this case is not novel”). But see Sexton v. State, 93 S.W.3d 96, 101 (Tex. Ct. Crim. App. 2002) (rejecting expert’s claim that the technique was “one hundred percent accurate” and noting while the “underlying theory of toolmark examination could be reliable in a given case,” the use in this case on unfired bullets was not sufficiently established). Further cases discussed—and rejected—the question of whether an expert’s conclusions were based on inadmissible hearsay, rather than their own observations and conclusions.217See State v. Montgomery, No. 94CA40, 1996 Ohio App. LEXIS 1361, at *14 (Oh. Ct. App. Mar. 29, 1996) (“While it is true that other colleagues provided [the expert] with information . . . the major part of his opinion was based on his own observations and expertise.”). And many courts, both state and federal, continued to admit the testimony without serious discussion.218See, e.g., State v. Gainey, 558 S.E.2d 463, 473–74 (N.C. 2002) (rejecting challenge to prosecution’s expert because of “extensive knowledge of the subject matter”); United States v. O’Driscoll, No. 4:CR-01-277, 2003 U.S. Dist. LEXIS 3370, at *4–6 (M.D. Pa. Feb. 10, 2003) (briefly rejecting challenge); United States v. Foster, 300 F. Supp. 2d 375, 376–77 (D. Md. 2004) (same). But for a particularly detailed review of application of Daubert factors to firearms comparison evidence, see United States v. Hicks, 389 F.3d 514, 526 (5th Cir. 2004).

In additional cases, judges dismissed objections to firearms experts whose testimony was said to reach “ultimate issues,” with the judges noting that the experts only opined regarding an acceptable “reasonable scientific certainty.”219State v. Riley, 568 N.W.2d 518, 526 (Minn. 1997). Thus, judges have emphasized the flexibility of the Daubert and Kumho Tire220Kumho Tire Co. v. Carmichael, 526 U.S. 137, 152–53 (1999) (setting out the application of Daubert to expert testimony by nonscientists). standards. As a Southern District of New York ruling explained:

The Court has not conducted a survey, but it can only imagine the number of convictions that have been based, in part, on expert testimony regarding the match of a particular bullet to a gun seized from a defendant or his apartment. It is the Court’s view that the Supreme Court’s decisions in Daubert and Kumho Tire, did not call this entire field of expert analysis into question. It is extremely unlikely that a juror would have the same experience and ability to match two or more microscopic images of bullets.221United States v. Santiago, 199 F. Supp. 2d 101, 111–12 (S.D.N.Y. 2002).

B.  Growing Judicial Skepticism

The federal courts took the lead in beginning to scrutinize firearms comparison testimony more closely. Judges began to write opinions with detailed examinations of the underlying methods experts used. Federal courts then imposed partial exclusions regarding either (1) the methods or qualifications of the particular experts or (2) the language the expert was permitted to use to describe the conclusion. In more recent years, state courts, including trial courts, have joined federal courts in asking more detailed questions and limiting uses of firearms expert testimony.

A turning point was the District of Massachusetts ruling in United States v. Green.222United States v. Green, 405 F. Supp. 2d 104 (D. Mass. 2005). Then-judge Gertner described that the firearms expert had planned to testify about individual characteristics which the expert stated could be matched “to the exclusion of every other firearm in the world.”223Id. at 107. At the opinion’s outset, the court stated that this conclusion was “extraordinary.”224Id. The court also gave one of the earliest detailed descriptions of the exactness—or lack thereof—of the toolmark comparison methodology:

In firearm toolmark comparisons, exact matches are rare. The examiner has to exercise his judgment as to which marks are unique to the weapon in question, and which are not.

In fact, shell casings have myriad markings, some of which appear on all casings from the same type of weapon (“class characteristics”) or those manufactured at the same time (“sub-class characteristics”). Others are arguably unique to a given weapon (“individual characteristics”) or are unique to a single firing (“accidental characteristics”).225Id.

Judge Gertner then explained:

The task of telling them apart is not an easy one. Even if the marks on all of the casings are the same, this does not necessarily mean they came from the same gun. Similar marks could reflect class or sub-class characteristics, which would define large numbers of guns manufactured by a given company. Just because the marks on the casings are different does not mean that they came from different guns. Repeated firings from the same weapon, particularly over a long period of time, could produce different marks as a result of wear or simply by accident.226Id.

Judge Gertner emphasized that in “distinguishing class and sub-class characteristics from individual ones,” the examiner “conceded, over and over again, that he relied mainly on his subjective judgment. There were no reference materials of any specificity, no national or even local database on which he relied.”227Id. Despite these concerns, the court candidly acknowledged that “the problem for the defense is that every single court post-Daubert has admitted this testimony, sometimes without any searching review, much less a hearing.”228Id. Judge Gertner ultimately allowed the expert testimony because “any other decision [would] be rejected by appellate courts, in light of precedents across the country.”229Id. at 109. Nevertheless, the court did not “allow [the expert] to conclude that the match he found by dint of the specific methodology he used permits ‘the exclusion of all other guns’ as the source of the shell casings.”230Id. at 124.

In a second Massachusetts case, United States v. Monteiro, Judge Gertner next held—for the first time—that firearms comparison evidence was inadmissible on an as-applied challenge under Rule 702.231United States v. Monteiro, 407 F. Supp. 2d 351, 375 (D. Mass. 2006). Because of “the extensive documentary record,” the court held that the “underlying scientific principle behind firearm identification—that firearms transfer unique toolmarks to spent cartridge cases—is valid under Daubert.”232Id. at 355. At the same time, Judge Gertner noted that the “process of deciding that a cartridge case was fired by a particular gun is based primarily on a visual inspection” that is “largely a subjective determination.”233Id. (emphasis added). Because of this subjectivity, a testifying examiner must “follow the established standards for intellectual rigor in the toolmark identification field with respect to documentation of the reasons for concluding there is a match (including, where appropriate, diagrams, photographs or written descriptions), and peer review of the results by another trained examiner in the laboratory.”234Id. Ultimately, the court concluded that even though the methodology could be reliable and even though the examiner was qualified based on his training and experience, the expert’s opinion was inadmissible because the expert did not sufficiently comply with proper peer review and documentation requirements.235Id. The Government, however, was allowed—without prejudice—to resubmit evidence of the test results that complied with the standards in the field. Id.

Other federal courts began to follow the approach of Judge Gertner. The Northern District of California in 2007 held that an expert could only testify to a “reasonable degree of certainty in the ballistics field.”236United States v. Diaz, No. CR 05-00167 WHA, 2007 U.S. Dist. LEXIS 13152, at *3 (N.D. Cal. Feb. 12, 2007). But the court commented:

[I]t is important to note that—at least according to this record—there has never been a single documented decision in the United States where an incorrect firearms identification was used to convict a defendant. This is not to say that examiners do not make mistakes. The record demonstrates that examiners make mistakes even on proficiency tests. But, in view of the thousands of criminal defendants who have had an incentive to challenge firearms examiners’ conclusions, it is significant that defendants cite no false-positive identification used against a criminal defendant in any American jurisdiction.237Id. at *41.

Other federal courts, however, instead continued to admit conclusions given with “100% degree[s] of certainty.”238United States v. Natson, 469 F. Supp. 2d 1253, 1261 (M.D. Ga. 2007); see also United States v. Williams, 506 F.3d 151, 161 (2nd Cir. 2007) (discussing United States v. Santiago, 199 F. Supp. 2d 101 (S.D.N.Y. 2002), and agreeing that firearms comparison testimony remains proper). For a state court case discussing Monteiro and emphasizing that California admissibility standards are different, see People v. Gear, No. C049666, 2007 Cal. App. Unpub. LEXIS 6454 (Cal. Ct. App. Aug. 8, 2007). The next shift occurred after the scientific community produced substantial reports raising new reliability questions.

C.  The 2008, 2009, and 2016 Scientific Reports

Over half of the rulings in our database occurred after 2009 when the National Academy of Sciences released a groundbreaking report concerning forensic evidence. To be sure, commercial legal databases may have a greater concentration of more recent appellate rulings. But one might have expected a similar outpouring of judicial rulings after the Daubert ruling in 1993—a fairly modern opinion. Instead, we observe change following an intervention by the scientific community over a decade and a half later.

During this time, the separate field of comparative bullet lead analyses—in which examiners claimed to use chemistry to identify unique elemental makeup of a bullet—was discredited and abandoned by the FBI after the NAS found it lacked any scientific foundation.239Nat’l Rsch. Council, Forensic Analysis: Weighing Bullet Lead Evidence 6 (2004). The NAS is “a private, nonprofit, self-perpetuating society of distinguished scholars engaged in scientific and engineering research, dedicated to the furtherance of science and technology and to their use for the general welfare.”240See 2008 NAS Report, supra note 40, at iii. Indeed, even before the report, courts had begun to exclude such evidence.241See, e.g., Clemons v. State, 896 A.2d 1059, 1074–79 (Md. 2006); Ragland v. Commonwealth, 191 S.W.3d 569, 574–80 (Ky. 2006). While it was a very different discipline, those developments may have raised further concerns in the judiciary regarding the work of firearms examiners.

In a 2008 report focused on the feasibility of a national ballistic imaging database, the NAS concluded that underlying assumptions of firearms comparisons were not yet validated.2422008 NAS Report, supra note 40, at 3 (“The validity of the fundamental assumptions of uniqueness and reproducibility of firearms-related toolmarks has not yet been fully demonstrated.”). Furthermore, “a significant amount of research” would need to be done to determine what characteristics might allow one to determine a probative connection between pieces of firearms evidence.243Id.; see also United States v. Taylor, 663 F. Supp. 2d 1170, 1175 (D.N.M. 2009) (describing the scope of that report which focused on feasibility of a ballistics database but noting that the question “was inextricably intertwined with the question of ‘whether a particular set of toolmarks can be shown to come from one weapon to the exclusion of all others’ ”).

In 2009, the NAS released its landmark report, Strengthening Forensic Science in the United States, after Congress directed NAS to undertake the study recognizing that substantial improvements were needed in the field of forensic science.244See 2009 NAS Report, supra note 41, at xix. The 2009 NAS Report contains a scientific assessment of a variety of forensic science disciplines along with recommendations for improvements in each discipline and to the forensic system as a whole. The Committee assembled by the NAS included prominent forensic scientists, research scientists, lawyers, and judges.245See id. at xix–xx. The Report identified a wide range of methodological issues with the practices of forensic firearm and toolmark identification.

Although the NAS Report did acknowledge that class characteristics are helpful in narrowing the pool of firearms that may have fired a particular bullet or cartridge case, it recognized that firearm examiners necessarily go beyond class characteristics when making an identification. The Report noted that a “fundamental problem with toolmark and firearms analysis is the lack of a precisely defined process”246Id. at 155. and that the AFTE methodology “does not even consider, let alone address, questions regarding variability, reliability, repeatability, or the number of correlations needed to achieve a given degree of confidence.”247Id. The Report concluded that “[b]ecause not enough is known about the variabilities among individual tools and guns, [firearm examiners are] not able to specify how many points of similarity are necessary for a given level of confidence in the result.”248Id. at 154.

Building on this work by the NAS, the 2016 President’s Council of Advisors on Science and Technology (“PCAST”) published its 2016 report on the use of forensic science in criminal proceedings. The report was a response to President Obama’s question “whether there [we]re additional steps on the scientific side, [in addition to those identified in the 2009 NAS Report], that could help ensure the validity of forensic evidence used in the Nation’s legal system.”249PCAST Report, supra note 43, at x. The advisory group consisted of “leading scientists and engineers, appointed by the President to augment the science and technology advice available to him from inside the White House, and from cabinet departments and from other Federal agencies.”250Id. at iv. The group focused on six feature-comparison methods including firearms-comparison evidence.251Id. at 7.

Consulting with forensic scientists, PCAST reviewed more than two thousand studies from various disciplines.252Id. at 2. The field had responded to the NAS reports by conducting new studies, and PCAST undertook a deep examination of them. As the NAS had done in its 2009 Report, PCAST asked whether each discipline met basic requirements for scientific validity, which consists of both “foundational validity”—whether the method can, in principle, be reliable—and “validity as applied”—whether the method has been reliably applied in practice.253Id. at 47–48, 56–58.

To be foundationally valid, a method must have been subject to “empirical testing by multiple groups, under conditions appropriate to its intended use.”254Id. at 5. Specifically, “the procedures that comprise it must be shown, based on empirical studies, to be repeatable, reproducible, and accurate, at levels that have been measured and are appropriate to the intended application.”255Id. at 47. The studies must also provide “valid estimates of the method’s accuracy,” demonstrating how often an examiner is likely to draw the wrong conclusion even when applying the method correctly (that is, a scientifically valid error rate).256Id. at 5. As PCAST explained, “Without appropriate estimates of [the method’s] accuracy, an examiner’s statement that two samples are similar—or even indistinguishable—is scientifically meaningless: it has no probative value, and considerable potential for prejudicial impact.”257Id. at 6.

Ultimately, as described below, PCAST concluded that all but one of the existing studies did not use appropriate designs to truly test the ability of a firearm examiner to make accurate identifications. PCAST went on to conclude that “[b]ecause there has been only a single appropriately designed study, the current evidence falls short of the scientific criteria for foundational validity.”258Id. at 111. Much like the NAS report that preceded it, PCAST pointed to the necessity for additional, appropriately designed studies to test the validity of firearm examination.259Id.

1.  Evaluation of the Scientific Studies

PCAST divided the firearms identification studies it reviewed into two different types: set-to-set studies and sample-to-sample studies. In a set-to-set study, examiners are given two sets of bullets and then asked to link the first set of bullets to the second set of bullets. In a sample-to-sample study, examiners are given two bullets to compare and are asked to judge whether the bullets were fired by the same gun or not. This process is then repeated for other test sets of bullets. PCAST concluded that “set-based studies are not appropriately-designed black-box studies from which one can obtain proper estimates of accuracy.”260Id. at 106.

The principal problem of set-to-set studies is that test takers can leverage the design to gain inferences about other comparisons, making the task totally unlike real-world comparison work.261United States v. Cloud, 576 F. Supp. 3d 827, 842–43 (E.D. Wash. 2021) (“Such studies lack external validity, as examiners conducting real-world comparisons have neither the luxury of knowing a true match is somewhere in front of them nor of making process-of-elimination-type inferences to reach their conclusions.”). For example, if an examiner identifies a match between bullets one and two, and then determines that bullet one and bullet A match, then bullet two and bullet A must also be a match by implication. Thus, a test taker would get a correct response for linking unknown bullet two to known bullet A despite never directly comparing the bullets. PCAST noted that: “[t]he Director of the Defense Forensic Science Center analogized set-based studies to solving a ‘Sudoku’ puzzle, where initial answers can be used to help fill in subsequent answers.”262PCAST Report, supra note 44, at 106. Because of this, set-to-set studies typically yield errors rates of zero and very few inconclusive responses.263Id.

At the time PCAST conducted its analysis, there was only a single sample-to-sample study available for firearms identification. The unpublished study was conducted by researchers at the Ames Laboratory in Iowa.264David P. Baldwin, Stanley J. Bajic, Max Morris & Daniel Zamzow, A Study of False-Positive and False-Negative Error Rates in Cartridge Case Comparisons (2014) [hereinafter Ames I]. In this first Ames Lab study, 218 firearm examiners were mailed a test packet that contained cartridge cases to examine. Each test packet totaled 15 separate comparisons for the examiners to evaluate. Unbeknown to the participants, 10 of the comparisons were different-source comparisons, for which the correct response was elimination, and 5 were same-source comparisons, for which the correct response was identification. Examiners were instructed to work alone on the test and to follow the AFTE protocol.

The study reported a 1.01% false positive error rate.265Id. at 3. What was not stated explicitly in the study is that 33.7% of the responses were deemed inconclusive—a pattern of results wildly at odds with the results from the set-to-set studies.266Id. at 16. There were 2,180 different source comparisons of which 735 were inconclusive (735 / 2,180 = 33.7%). Compare that figure to a well-known set-to-set study by Hamby which reported only 8 inconclusives—0.1%—out of 7,605 comparisons. J.E. Hamby, David J. Brundage & James W. Thorpe, The Identification of Bullets Fired From 10 Consecutively Rifled 9mm Ruger Pistol Barrels: A Research Project Involving 507 Participants from 20 Countries, 41 AFTE J. 99 (2009). PCAST noted that “the closed-set studies show a dramatically lower rate of inconclusive examinations and of false positives. With this unusual design, examiners succeed in answering all questions and achieve essentially perfect scores. In the more realistic open designs, these rates are much higher.”267PCAST Report, supra note 44, at 110. PCAST was not the first group to point out the shortcomings of set-to-set studies. The Ames study, for example, stated,

Several previous studies have been carried out to examine this and related issues of individualization and durability of marks [1-5], but the design of these previous studies, whether intended to measure error rates or not, did not include truly independent sample sets that would allow the unbiased determination of false-positive or false-negative error rates from the data in those studies.

Ames I, supra note 264, at 4.

One federal district court, in extensively discussing the PCAST report findings, noted, “Based on the above information, the court finds that the potential rate of error for matching ballistics evidence based on the AFTE Theory does not favor a finding of reliability at this time.”268United States v. Shipp, 422 F. Supp. 3d 762, 778–79 (E.D.N.Y. 2019). The court noted, however, that the FBI and the Ames Laboratory were “currently conducting a second black box study on the AFTE Theory.”269Id. at 779. That study was posted online in early 2021 (and subsequently removed from the Internet).270Components of the Ames II study still appear online. See L. Scott Chumbley, Max D. Morris, Stanley J. Bajic, Daniel Zamzow, Erich Smith, Keith Monson & Gene Peters, Accuracy, Repeatability, and Reproducibility of Firearms Comparisons Part I: Accuracy, https://arxiv.org/ftp/arxiv/papers/2108/2108.04030.pdf [https://perma.cc/EJB6-E434].

The FBI/Ames Laboratory study (hereinafter “Ames II”) utilized a design ambitious in size and scope. First, the study contained both cartridge case and bullet comparisons. The vast majority of previous firearms comparison studies examined only cartridge cases. Second, the study consisted of three rounds that attempted to measure accuracy (round one), repeatability (round two), and reproducibility (round three). Repeatability refers to “the ability of an examiner, when confronted with the exact same comparison once again, to reach the same determination as when first examined.”271Stanley J. Bajic, L. Scott Chumbley, Max Morris & Daniel Zamzoe, U.S. Dep’t of Just., Report: Validation Study of the Accuracy, Repeatability, and Reproducibility of Firearm Comparisons, Ames Laboratory 10 (2020) [hereinafter Ames II] (on file with authors). Reproducibility refers to “the ability of a second examiner to evaluate a set previously viewed by a different examiner and reach the same conclusion.”272Id. at 11. No other study had attempted to measure repeatability and reproducibility of firearm examiner judgments.

In round one of the study, 256 active firearm examiners were sent test packets—each test packet contained 15 comparison sets of bullets and 15 comparison sets of cartridge cases. For each comparison, participants were instructed to make a judgement according to the AFTE Range of Conclusions.273The Range of Conclusions includes the following options: (1) Identification, (2a) Inconclusive-A, (2b) Inconclusive-B, (2c) Inconclusive-C, (3) Elimination, and (4) Unsuitable. See Ames I, supra note 264, at 7. Participants were admonished not to discuss their results with anyone else. However, only 173 participants out of 256 returned their test packets. According to the authors, “the overall rate of false positive error rate was estimated as 0.656% and 0.933% for bullets and cartridge cases, respectively, while the rate of false‐negatives was estimated as 2.87% and 1.87% for bullets and cartridge cases, respectively.”274Ames II, supra note 276, at 2. Here again, there was an enormous amount of inconclusive responses: over 50% of the bullet comparisons were deemed inconclusive, and over 42% of the cartridge comparisons were deemed inconclusive.275Id. at 35.

In round two of the study, participants were sent the same test packet they examined previously. Only 105 participants completed this round.276Id. at 39. The percentage of time that examiners reached the same conclusion in round one and round two ranged from 79% to 62%.277Id. at 39. This does not necessarily mean the examiner reached the correct conclusion about two-thirds of the time; rather, it only suggests she reached the same conclusion about two-thirds of the time. According to the authors, a statistical test comparing the “observed agreement” between conclusions reached in round one and in round two to the “expected agreement” “indicat[ed] ‘better than chance’ repeatability.”278Id. at 45. However, two different statisticians concluded that: “[t]he level of repeatability and reproducibility as measured by the between rounds consistency of conclusions would not appear to support the reliability of firearms examination.”279Alan H. Dorfman & Richard Valliant, A Re-analysis of Repeatability and Reproducibility in the Ames-USDOE-FBI Study, 9 Stat. & Pub. Pol’y 175, 178 (2020).

Only 80 participants completed round three of the study.280Ames II, supra note 276, at 15. The percentage of time that 2 different participants examined the same test set and reached the same conclusion ranged from 68% to 31%.281Id. at 47. These latter results are striking. Less than one-third of the time, 2 different participants looked at the same bullets and reached the same conclusion. This means that over two-thirds of the time (69.1%), 2 different participants reached different conclusions when examining the same set of bullets. A statistical test revealed “better than chance” agreement for same-source bullet comparisons but not different-source bullet comparisons.282Id. at 52.

D.  Litigating the Error Rate Studies

The conclusion reached by PCAST that “firearms analysis currently falls short of the criteria for foundational validity”283PCAST Report, supra note 43, at 112. did not go unnoticed by the defense bar. Admissibility challenges to firearm examiner testimony surged—we include more than eighty such cases in our database.284For recent cases in which the defendant challenged firearms testimony, see People v. Ross, 129 N.Y.S.3d 629, 639 (Sup. Ct. 2020); United States v. Tibbs, No. 2016 CF1 19431, 2019 D.C. Super. LEXIS 9 (D.C. Super. Ct. Sept. 5, 2019); United States v. Davis, No. 4:18-cr-00011, 2019 U.S. Dist. LEXIS 155037 (W.D. Va. Sept. 11, 2019); United States v. Shipp, 422 F. Supp. 3d 762 (E.D.N.Y. 2019); United States v. Johnson, No. (S5) 16 Cr. 281 (PGG), 2019 U.S. Dist. LEXIS 39590 (S.D.N.Y. Mar. 11, 2019), aff’d, 861 F. App’x 483 (2d Cir. 2021); United States v. Romero-Lobato, 379 F. Supp. 3d 1111 (D. Nev. 2019); United States v. Shipp, 422 F. Supp. 3d 762 (E.D.N.Y. 2019); State v. Terrell, No. CR170179563, 2019 Conn. Super. LEXIS 827 (Conn. Super. Ct. Mar. 21, 2019); United States v. Simmons, No. 2:16cr130, 2018 U.S. Dist. LEXIS 18606 (E.D. Va Jan. 12, 2018). These challenges often summarized the PCAST analyses and conclusions in arguing that the field failed to pass Daubert’s muster. These challenges, however, almost universally failed. Critics of PCAST sought to characterize the report as authored by outsiders who failed to learn the fundamentals of firearm examination and who committed numerous errors in their own analysis.285For example, the Organization of Scientific Area Committee (“OSAC”) Firearms and Toolmarks Subcommittee issued a formal response in which it claims to catalog “[e]rrors and [o]missions in PCAST [s]ummaries of [f]irearms and [t]oolmarks [v]alidation [s]tudies.” Org. of Sci. Area Comms. (OSAC) Firearms & Toolmarks Subcomm., Response to the President’s Council of Advisors on Science and Technology (PCAST) Call for Additional References Regarding its Report “Forensic Science in the Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods” 11 (2016). See also Ass’n of Firearm & Tool Mark Examiners, Response to Seven Questions Related to Forensic Science Posed on November 30, 2015 by The President’s Council of Advisors on Science and Technology (PCAST) (2015). But the tides have recently begun to shift, as courts imposed new, albeit still limited, restrictions on the type of testimony firearm examiners may offer and how they express conclusions.286See, e.g., Tibbs, 2019 D.C. Super. LEXIS 9; Ross, 129 N.Y.S.3d 629. We have identified thirty-seven judicial rulings imposing limitations on firearms comparison testimony and set out each in Appendix A.

Two factors have contributed to the shifting tides. First, in addition to citing the NAS and PCAST reports, attorneys have called mainstream research scientists to testify generally about scientific methods and principles and specifically about the discipline of firearm examination. These experts are not firearm examiners and typically have never conducted a firearm examination.287See generally, e.g., Faigman et al., supra note 45. Much like the practitioner/researcher distinction in medicine, these experts are researchers who study whether the methods employed by the practitioners are effective. These experts are poised to evaluate claims made in court regarding scientific practices.288For example, judges are supposed to consider whether research appears in a “peer-reviewed” scientific journal. See supra notes 189–192 and accompanying text. Most research on firearm examination is published in the AFTE Journal which is touted in court as a “peer-reviewed scientific” journal. See AFTE J., https://afte.org/afte-journal. Upon closer inspection, however, the peer-review process used by the AFTE Journal is highly dissimilar to the usual process that occurs at scientific journals. See Tibbs, 2019 D.C. Super. LEXIS 9, at *25.

The second major factor concerns additional examination of the PCAST-reviewed studies that potentially undermine the reported error rates and the utility of the validation studies. As noted, one-third of the responses in the Ames I study were inconclusive.289See supra note 266 and accompanying text.

What ought to be done with those responses? PCAST ultimately calculated the error rate without considering them. Other firearm studies actually count inconclusive responses as correct responses, based on the logic that “an inconclusive response is not an incorrect response [so they are] totaled with the correct response and figured into the error rate as such.”290Dennis J. Lyons, The Identification of Consecutively Manufactured Extractors, 41 AFTE J. 246, 255 (2009). But what if those responses are errors? The error rate would be as high as 35% in the Ames I study. Other sample-to-sample studies conducted after the PCAST analyses have reported rates of inconclusive responses over 50%.291See Ames II, supra note 271, at 35. Clearly, determining how to count over half of the responses in a validation study is critical.

There are many legitimate reasons to count the inconclusive responses in the Ames I study, including the fact that “[t]he fraction of samples reported as inconclusive cannot be attributed to a large fraction of poorly marked knowns or questioned samples in this group”292Ames I, supra note 264, at 19. and an inconclusive response is also defined by AFTE as an absence of insufficient quality of marking to reach an identification or elimination.293AFTE Range of Conclusions, Ass’n of Firearm & Tool Mark Examiners, https://afte.org/about-us/what-is-afte/afte-range-of-conclusions [https://perma.cc/EJB6-E434] (last visited July 29, 2022). As noted in a 2020 scientific article, a proper study design would include inconclusive test items so that inconclusive responses could be evaluated and incorporated into the error rate.294Itiel E. Dror & Nicholas Scurich, (Mis)use of Scientific Measurements in Forensic Science, Forensic Sci. Int’l: Synergy 333, 335–36 (2020). No study has yet done so and, as a result, error rates observed in the studies span a range so large as to be wholly unhelpful—anywhere from one percent to over fifty percent, depending on whether the responses are dropped or considered as erroneous. Thus, as one district court recently put it,

But providing examiners in the study setting the option to essentially “pass” on a question, when the reality is that there is a correct answer—the casing either was or was not fired from the reference firearm—fundamentally undermines the study’s analysis of the methodology’s foundational validity and that of the error rate.295United States v. Cloud, 576 F. Supp. 3d 827, 843 (E.D. Wash. 2021).

This crucial issue of inconclusive responses was never considered prior to Tibbs, discussed earlier in this Article,296See United States v. Tibbs, No. 2016 CF1 19431, 2019 D.C. Super. LEXIS 9, at *56–66 (D.C. Super. Ct. Sept. 5, 2019) (discussing the issue of inconclusiveness in an order following an admissibility hearing). in which a defense expert raised the concern during an admissibility hearing. The judge in Tibbs called it “perhaps [the] most substantial issue related to the studies proffered to support the reliability of firearms and toolmark analysis”297Id. at 56–57. and noted that “the methods used in the proffered laboratory studies make a compelling case that inconclusive should not be accepted as a correct answer in these studies.”298Id. at 57–58. To be sure, in one 2020 Washington, D.C. case, a judge discounted those findings for which no defense expert was presented to explain these error rate issues.299See United States v. Harris, 502 F. Supp. 3d 28, 35 (D.D.C. 2020).

Then again, another 2020 case in Oregon limited the admissibility of firearms testimony without the benefit of a defense expert witness.300United States v. Adams, 444 F. Supp. 3d 1248 (D. Or. 2020). This judge expressed major concerns about inconclusive responses in firearms comparison studies and their impact on reported error rates:

It appears to be the case that the only way to do poorly on a test of the AFTE method is to record a false positive. There seems to be no real negative consequence for reaching an answer of inconclusive. Since the test takers know this, and know they are being tested, it at least incentivizes a rate of false positives that is lower than real world results. This may mean the error rate is lower from testing than in real world examinations.301Id. at 1265.

A litany of other concerns besides the inconclusive response issue have been raised about the error rate studies. We mention four important issues here.

First, and most fundamentally, none of the studies were test-blind—the participants knew that they were being tested. There is powerful evidence that human subjects are predictably biased—and behave differently—when they know that they are being tested. The PCAST report emphasized the need for blind testing of forensic techniques.302PCAST Report, supra note 43, at 58–59. So have a host of researchers based on a large body of research documenting the manner in which cognitive biases can lead forensic examiners to make errors.303See generally, e.g., Itiel E. Dror, Cognitive and Human Factors in Expert Decision Making: Six Fallacies and the Eight Sources of Bias, 92 Analytical Chemistry 7998 (2020). Although blind testing is standard in medicine, it has never been standard in error rate studies in forensics.

Second, many of the volunteer participants in both of the Ames studies simply dropped out or participated but did not complete the test. In the Ames II study, “32% of the 256 examiners receiving their first packets failed to report any results, and another 32% of the 256 dropped out before completing all six mailings.”304Alan H. Dorfman & Richard Valliant, Inconclusives, Errors, and Error Rates in Forensic Firearms Analysis: Three Statistical Perspectives, 5 Forensic Sci. Int’l: Synergy 1, 5 (2022). No analysis of the participants who initiated the study but declined to complete it was conducted.305Id. Attrition bias due to nonrandom dropout is a serious concern that has an unknown impact on the reported error rates. Although one court has noted that the “use of volunteers . . . does not provide the clearest indication of the accuracy of the conclusions that would be reached by average toolmark examiners,”306United States v. Tibbs, No. 2016 CF1 19431, 2019 D.C. Super. LEXIS 9, at *47–48 (D.C. Super. Ct. Sept. 5, 2019). courts have not focused on issues related to participant dropout.

Third, there are also questions about whether the materials being used in the studies, such as the types of firearms and the quality of the fired items, are sufficiently representative to draw inferences about the field writ large. By design, studies should be of varying degrees of difficulty, but unfortunately, “[w]ith a few exceptions, each of the forensic firearms studies to date focuses on a single firearm,” and the exceptions are telling, whereas studies that use different types of firearms have resulted in very different error rates for each type.307See Dorfman & Valliant, supra note 304, at 5 (“The few studies that have carried out comparisons over a variety of guns have displayed marked differences in the ease of coming to correct conclusions.”). Further, if in a study, “an examiner is over and over comparing bullets or cartridge cases from the same brand and model, then he or she can be expected to be picking up nuances along the way. A later comparison will have an advantage over the first. We can expect this to lead to a reduction in sample error rates.”308Id. Unlike other forensic identification fields, none of these studies have used technology or databases to ensure the test items are challenging.309Nicholas Scurich, Inconclusives in Firearm Error Rate Studies are Not “a Pass,” L. Probability & Risk (2022) (“[R]esearchers should intentionally select challenging test items, in a manner similar to Professor Koehler’s exemplary fingerprint examiner study involving ‘close non-matches.’ ”). Nor has there been any careful analysis of how representative or challenging these studies are, and this basic problem has not received the judicial attention that it should.

Finally, judges have not focused on the appalling levels of nonrepeatability and nonreproducibility of firearms work in the Ames II study: “[E]xaminers examining the same material twice, disagree[d] with themselves between 20% and 40% of the time.”310Ames II, supra note 271, at 39 tbl.XI; Dorfman & Valliant, supra note 304, at 6. They disagreed with other examiners even more, up to 69% of the time for nonmatching bullets and up to 60% of the time for nonmatching cartridges.311Dorfman & Valliant, supra note 304, at 6. Although there is spirited debate about inconclusive results and whether or not they constitute errors in a study, these rates of intra- and interparticipant consistency should eclipse that entire discourse—they set a limit on validity and cannot be dismissed as a disagreement about the interpretation of inconclusive responses. Yet, likely because Daubert explicitly mentions error rates, not rates of consistency, courts have yet to grapple with these findings and how they can be reconciled with professed error rates of one percent or less.

All of this said, it is not uncommon for judges to respond to these studies, the PCAST report, and critiques from research scientists dismissively. Judges have commonly relied on precedent to make conflated arguments against the invalidity of the studies. For example, one judge in New York state—a Frye jurisdiction where the standard for expert evidence admissibility is the “general acceptance” of the method within the relevant scientific community—recently emphasized that the acceptance of firearms comparison methods within the community of practitioners is “nearly universal”312State v. Vasquez, No. 2203/2019, at 3 (N.Y. Sup. Ct., July 24, 2022). According to this judge, the relevant scientific community is not “experts in ‘scientific methodology,’ which is to say, scientists,” id. at 2, but rather “trained and accredited experts in the field of microscopic ballistics and forensic firearm and toolmark examination” as well as “non-firearm practitioners enumerated in the multiple validation studies that have been conducted to demonstrate the reliability of the discipline and its examination results,” id. at 3 (emphasis added). Conducting a study to demonstrate a result is not good science. and that “the Appellate Division . . . has repeatedly upheld the admission of ballistics expert testimony without the need for a Frye hearing.”313Id. at 5. But the judge then went on to hold that “the PCAST report has been thoroughly discredited”314Id. at 4. and the “the very type of study called for by PCAST—a ‘black box study’—has, since the time of the PCAST report, been repeatedly utilized to validate firearm and toolmark comparison methodology.”315Id. at 5. There was no engagement with the results of those studies or their limitations. Unfortunately, it is common for judges to rely on precedent as a form of “general acceptance” by the courts and not carefully examine the reliability of scientific evidence.316Stephanie L. Damon-Moore, Trial Judges and the Forensic Science Problem, 92 N.Y.U. L. Rev. 1532, 1564 (2017) (“Ironically, the ultimate safeguard against judicial error—appellate review—may actually discourage judges from gatekeeping effectively.”).

E.  Testimonial Limitations and Post-NAS and PCAST Rulings

In recent years, courts have more rigorously evaluated the field of firearms examination, in contrast to over fifty years in which claims made by firearm examiners regarding the foundational validity were uncritically accepted.317See, e.g., United States v. Shipp, 422 F. Supp. 3d 762, 775 (E.D.N.Y. 2019) (“Even though prior decisions have found toolmark analysis to be reliable, it is incumbent upon this court to thoroughly review the critiques of the AFTE Theory found in the NRC and PCAST Reports.”); United States v. Adams, 444 F. Supp. 3d 1248, 1266 (D. Or. 2020) (concluding that it could not “find that the AFTE method enjoys ‘general acceptance’ in the scientific community”); People v. Ross, 129 N.Y.S.3d 629, 641 (N.Y. Sup. Ct. 2020) (“[B]eyond comparing class characteristics forensic toolmark practice lacks adequate scientific underpinning and the confidence of the scientific community as whole.”). These more searching evaluations have led judges to note limitations and knowledge gaps that had rarely been discussed in judicial opinions. Despite increasing awareness of the limitations of the field, almost all courts have nevertheless found it admissible.318Ricks v. Pauch, No. 17-12784, 2020 U.S. Dist. LEXIS 89453, at *29–32 (E.D. Mich. Mar. 23, 2020); see also United States v. Romero-Lobato, 379 F. Supp. 3d 1111, 1117 (D. Nev. 2019) (“[N]o federal court (at least to the Court’s knowledge) has found the AFTE method to be unreliable under Daubert.”); United States v. Davis, No. 4:18-cr-00011, 2019 U.S. Dist. LEXIS 155037, at *12–15 (W.D. Va. Sept. 11, 2019) (“[N]o federal court has outright barred testimony from a qualified firearm or toolmark identification expert.”). This created a new conundrum for courts: how to admit firearms identification evidence in a way that does not overstate its value or cause the fact finder to be misled. In the Sections that follow, we report the four tacks that courts have taken when admitting firearm examination evidence: (1) limiting the language that experts can use when testifying to their conclusions, (2) limiting conclusions to class characteristics only, (3) ruling that evidence concerning the proficiency of firearms experts is relevant to the preliminary question whether to qualify the expert, and (4) examining the as-applied question whether the method was reliably used in the particular case.

1.  Limiting Conclusion Testimony

While many courts have continued to admit firearms examiner testimony, “[m]any of these courts admitted the proffered testimony only under limiting instruction restricting the degree of certainty to which firearm and toolmark identification specialists may express their identifications.”319Davis, 2019 U.S. Dist. LEXIS 155037, at *15. The case law that has resulted is diverse, sometimes inconsistent, and reflects a gradual evolution of judicial approaches. As we will describe, in general, a range of courts have limited testimony based on the concerns about toolmark identification methodology.320See, e.g., Shipp, 422 F. Supp. 3d at 783 (preventing a toolmark expert from testifying “to any degree of certainty, that the recovered firearm is the source of the recovered bullet fragment or the recovered shell casing”); Adams, 444 F. Supp. 3d at 1266–67 (same); United States v. Monteiro, 407 F. Supp. 2d 351, 373 (D. Mass. 2006) (same); Davis, 2019 U.S. Dist. LEXIS 155037, at *24 (“[W]itnesses may not testify as to a ‘match,’ that the cartridges bear the same ‘signature,’ that they were fired by the same gun, or words to that effect.”); United States v. Glynn, 578 F. Supp. 2d 567, 575 (S.D.N.Y. 2008) (limiting testimony to “be stated in terms of ‘more likely than not,’ but nothing more”).

The earlier decisions had held that an examiner could only testify to a milder degree, forbidding aggressive statements of a match, “the exclusion of all other firearms in the world,”321United States v. Cazares, 788 F.3d 956, 989 (9th Cir. 2015); United States v. Taylor, 663 F. Supp. 2d 1170, 1180 (D.N.M. 2009); United States v. Ashburn, 88 F. Supp. 3d 239, 249 (E.D.N.Y. 2015); see also United States v. Love, No. 2:09-cr-20317-JPM, at 14–15 (W.D. Tenn. Feb. 8, 2011) (excluding testimony with conclusions of absolute or practical certainty). and instead imposing a more cautious formulation, such as a “reasonable degree of ballistic certainty.”322United States v. Diaz, No. CR 05-00167 WHA, 2007 U.S. Dist. LEXIS 13152, at *36 (N.D. Cal. Feb. 12, 2007). Other courts have taken a different approach, using more familiar standards of proof as a frame of reference—courts have ruled that the examiner can only opine that it is “more likely than not” that the bullet recovered from the crime scene came from the defendant’s firearm.”323See Glynn, 578 F. Supp. 2d at 574–75 (limiting testimony to “more likely than not” conclusion). The table below summarizes some of the main approaches that courts have taken toward limiting such testimonial conclusions. Appendix A summarizes all thirty-seven opinions that we have located, through 2022, including unpublished trial court rulings.

Table 1.  Testimonial Limitations on Firearms Examiners
Court-ordered Conclusion LanguageCitations from selected examples
“more likely than not”United States v. Glynn, 578 F. Supp. 2d 567 (S.D.N.Y. 2008)
“reasonable degree of ballistic certainty”United States v. Monteiro, 407 F. Supp. 2d 351 (D. Mass. 2006)
“consistent with”United States v. Sutton, No. 2018 CF1 009709 (D.C. Super. Ct. May 9, 2022)
“a complete restriction on the characterization of certainty”United States v. Willock, 696 F. Supp. 2d 536 (D. Md. 2010)
“the recovered firearm cannot be excluded as the source of the cartridge casing found on the scene of the alleged shooting”United States v. Tibbs, No. 2016 CF1 19431, 2019 D.C. Super. LEXIS 9 (D.C. Super. Ct. Sept. 5, 2019); Missouri v. Goodwin-Bey, No. 1531-CR00555-01 (Mo. Cir. Ct. Dec. 16, 2016)
“qualitative opinions” can only be offered on the significance of “class characteristics”People v. Ross, 129 N.Y.S.3d 629 (N.Y. Sup. Ct. 2020)

The approach toward firearms testimony has evolved over the past two decades. The consensus approach, early on, was shared by a series of courts that adopted the formulation, “a reasonable degree of ballistic certainty.”324Diaz, 2007 U.S. Dist. LEXIS 13152, at *36; see also Commonwealth v. Pytou Heang, 942 N.E.2d 927, 945 (Mass. 2011); United States v. Simmons, No. 2:16cr130, 2018 U.S. Dist. LEXIS 18606, at *24–27 (E.D. Va. 2018); Cazares, 788 F.3d at 988; Monteiro, 407 F. Supp. 2d at 372; Taylor, 663 F. Supp. 2d at 1180; Ashburn, 88 F. Supp. 3d at 249; United States v. Hunt, 464 F. Supp. 3d 1252, 1262 (W.D. Okla. 2020). Thus, the court in Diaz allowed the examiner to testify “that cartridge cases or bullets were fired from a particular firearm ‘to a reasonable degree of ballistic certainty,’ ” as did a series of other federal courts.325Diaz, 2007 U.S. Dist. LEXIS 13152, at *36. In Monteiro, the district court ruled that the examiner could testify that the “class characteristics were in complete agreement,” but aside from observing that consistency, to a “reasonable degree of ballistic certainty,” no further probabilistic statement could be offered.326Monteiro, 407 F. Supp. 2d at 372. The court reasoned, “Allowing the firearms examiner to testify to a reasonable degree of ballistic certainty permits the expert to offer her findings, but does not allow her to say more than is currently justified by the prevailing methodology.”327Id. at 372.

It is not clear what a reasonable degree of certainty consists of—as a result, the U.S. Department of Justice has barred examiners in federal cases from using that or similar terminology:328U.S. Dep’t of Just., Uniform Language for Testimony and Reports for the Firearms/Toolmark Discipline Pattern Analysis 3 (2020).

An examiner shall not assert that two toolmarks originated from the same source with absolute or 100% certainty, or use the expressions ‘reasonable degree of scientific certainty,’ ‘reasonable scientific certainty,’ or similar assertions of reasonable certainty in either reports or testimony unless required to do so by a judge or applicable law.329Id. at 3.

The Department also barred examiners from making assertions of a “zero error rate” or infallibility.330Id. Those requirements marked a real charge from prior practice.

Second, during this time, some judges, like the Department of Justice itself, began to focus on the probabilistic claims by experts and limited toolmark experts’ testimony about conclusions that claim infallibility or the lack of any error rate—courts rejected assertions of zero error rates.331See, e.g., United States v. Romero-Lobato, 379 F. Supp. 3d 1111, 1117 (D. Nev. 2019) (acknowledging that the “general consensus” of the courts “is that firearm examiners should not testify that their conclusions are infallible or not subject to any rate of error, nor should they arbitrarily give a statistical probability for the accuracy of their conclusions”); State v. Terrell, No. CR170179563, 2019 Conn. Super. LEXIS 827, at *3 (Conn. Super. Ct. Mar. 21, 2019) (same); United States v. Glynn, 578 F. Supp. 2d 567, 574 (S.D.N.Y. 2008) (limiting testimony in part because when experts “make assertions that their matches are certain beyond all doubt, that the error rate of their methodology is ‘zero,’ ” there is a risk of “giving the jury the impression . . . that [the methodology] has greater reliability than its imperfect methodology permits”). Thus, courts rejected assertions of being “100% sure” or “certain.”332United States v. Parker, 871 F.3d 590, 600 (8th Cir. 2017). In Monteiro, the judge rejected the use of the phrase “a match to an exact statistical certainty.”333United States v. Monteiro, 407 F. Supp. 2d 351, 355 (D. Mass. 2006). Similarly, in United States v. Gardner, the judge held that the opinion could not be made with “unqualified” certainty.334Gardner v. United States, 140 A.3d 1172, 1184 (D.C. 2016).

A growing group of judges then offered intermediate approaches. Another District of Columbia judge held that an expert can testify that ammunition is “consistent with” being fired from the same firearm.335United States v. Sutton, No. 2018 CF1 009709, at *5 (D.C. Super. Ct. May 9, 2022) (permitting the examiner to opine “that the ammunition at issue is consistent with being fired from the same firearm”). The district court in United States v. Shipp ordered that the expert “may not testify, to any degree of certainty, that the recovered firearm is the source of the recovered bullet fragment or the recovered shell casing.”336United States v. Shipp, 422 F. Supp. 3d 762, 783 (E.D.N.Y. 2019). That court carefully examined the findings of the PCAST Report, and while it did not permit characterization of the level of certainty, the examiner could offer a statement of consistency.337Id. at 778. Other courts have taken this approach.338United States v. Davis, No. 4:18-cr-00011, 2019 U.S. Dist. LEXIS 155037, at *26–27 (W.D. Va. Sept. 11, 2019).

Going further to limit the testimony, in more recent cases, judges have barred any certainty-based statements at all. Thus, in the Tibbs ruling, the court held that the examiner could not offer any probability that the firearm in question could be included, but only that “the recovered firearm cannot be excluded as the source of the cartridge casing found on the scene of the alleged shooting.”339United States v. Tibbs, No. 2016 CF1 19431, 2019 D.C. Super. LEXIS 9, at *77 (D.C. Super. Ct. Sept. 5, 2019). In the Goodwin-Bey340State v. Goodwin-Bey, No. 1531-CR00555-01, slip op. at 7 (Mo. Cir. Ct. Dec. 16, 2016). ruling, the trial court did the same.341Id. (limiting testimony “to the point this gun could not be eliminated as the source of the bullet”). In United States v. Willock, the district judge ordered “a complete restriction on the characterization of certainty.”342United States v. Willock, 696 F. Supp. 2d 536, 546 (D. Md. 2010), aff’d sub nom. United States v. Mouzone, 687 F.3d 207 (4th Cir. 2012). Other courts have taken the same approach.343See United States v. White, No. 17 Cr. 611, 2018 U.S. Dist. LEXIS 163258, at *3 (precluding expert from testifying “to any specific degree of certainty as to his conclusion that there is a ballistics match”). Still other cases permitted the examiner to point to features and their similarities but not describe any level of agreement or consistency.344See, e.g., United States v. Green, 405 F. Supp. 2d 104, 124 (D. Mass. 2005); People v. Ross, 129 N.Y.S.3d 629, 642 (N.Y. Sup. Ct. 2020) (“The People may call an expert to testify as to whether there is evidence of class characteristics that would include or exclude the firearm at issue. . . . [T]he examiner may not opine on the significance of any marks other than class characteristics, as the reliability of that practice in the relevant scientific community as a whole has not been established. Moreover, any opinion based in unproven science and expressed in subjective terms such as ‘sufficient agreement’ or ‘consistent with’ may mislead the jury and will not be permitted.”).

None of these approaches adopt the approach of the American Statistical Association, which explains that to assert any degree of probability of an event, an established statistical basis must exist for that asserted degree of probability.345Am. Stat. Ass’n, Position on Statistical Statements for Forensic Evidence 2–3 (2019) [hereinafter ASA Report], https://www.amstat.org/asa/files/pdfs/POL-ForensicScience.pdf [https://perma.cc/T8EQ-BLZT]. Under this approach, an expert must be clear that no such statistical basis exists if none does exist.

The opinion that has gone farthest of all, however, is one of the most recent, a so-far unpublished opinion in 2023 by a trial judge in Cook County, Illinois. As noted, the judge wholly excluded firearms expert testimony, based on a review of scientific concerns with reliability.346See People v. Winfield, No. 15-CR-1406601, at 32–34 (Cir. Ct. Cook Cnty. Ill. Feb. 8, 2023).

2.  Limiting Non-Class-Based Opinions

Some jurisdictions under both Daubert and Frye have limited testimony to opinions offered on class characteristics only.347See, e.g., United States v. Adams, 444 F. Supp. 3d 1248, 1267 (D. Or. 2020); Ross, 129 N.Y.S.3d at 642 (“The People may proffer their NYPD ballistics detective as an expert in firearm and toolmark examination for the testimony on class characteristics as described above.”). That is, an expert can explain that the same type of gun fired the bullets or cartridge cases, but the expert cannot say that the same gun fired the bullets or cartridge cases. For example, here is the limiting instruction given by one federal judge who restricted the testimony to class characteristics:

[Firearm examiner’s] expert testimony is limited to the following observational evidence: (1) the Taurus pistol recovered in the crawlspace of [defendant’s] home is a 40 caliber, semi-automatic pistol with a hemispheric-tipped firing pin, barrel with six lands/grooves and right twist; (2) that the casings test fired from the Taurus showed 40 caliber, hemispheric firing pin impression; (3) the casings seized from outside the shooting scene were 40 caliber, with hemispheric firing pin impressions; and (4) the bullet recovered from gold Oldsmobile at the scene of the shooting were 40/l0mm caliber, with six lands/groves and a right twist.348Adams, 444 F. Supp. 3d. at 1267.

Courts have reasoned that descriptions of class characteristics are objective and measurable, whereas linking bullets to a particular gun is not “the product of a scientific inquiry,”349Id. at 1266. and “any opinion based in unproven science and expressed in subjective terms such as ‘sufficient agreement’ or ‘consistent with’ may mislead the jury and will not be permitted.”350Ross, 129 N.Y.S.3d at 642.

3.  Qualification and Proficiency Rulings

Judges have also focused on the proficiency of the particular expert to answer the preliminary question of whether a person is qualified to be an expert under Rule 702. Rule 702 requires that an expert witness have sufficient “knowledge, skill, experience, training, or education.”351Fed. R. Evid. 702 (requiring that an expert be “qualified as an expert by knowledge, skill, experience, training, or education”).

Typically, proficiency tests are administered by commercial test providers in which accredited labs are required to administer such tests annually.352Forensic Service Provider Accreditation, ANSI Nat’l Accreditation Bd., https://www.anab.org/forensic-accreditation [https://perma.cc/8GR7-4LPL]. For example, one leading provider, Collaborative Testing Services (“CTS”), makes available on its website the results of its tests for each discipline. CTS has cautioned that no “error rate” can be generalized from such tests because they are designed to be elementary.353Collaborative Testing Servs., Inc., CTS Statement on the Use of Proficiency Testing Data for Error Rate Determinations 3 (2010). Such tests are not proctored, can be taken in groups, have no time limit, include materials that are of unknown realism and difficulty, and are not “blind,” since participants know that it is a test.354Simon A. Cole, More Than Zero: Accounting for Error in Latent Fingerprint Identification, 95 J. Crim. L. & Criminology 985, 1029–30 (2005). However, the results do highlight the types of errors that practitioners may make. For example, a 2022 test included 7 participants, or 2% of the examiners, that failed to correctly identify the bullet that the known firearm had in fact fired in a test with a very small number of items; far higher numbers of examiners reported inconclusive responses which were also not accurate (but which CTS noted may follow lab practices).355See Collaborative Testing Servs., Inc., Firearms Examination Test No. 22-5261 Summary Report 3 (2022). CTS also noted that inconclusive responses were not counted as “outlier[]” errors, as “CTS is aware that many labs will not, as a matter of policy, report an elimination without access to the firearm or when class characteristics match.” Id.

In United States v. Cloud, the judge emphasized that one of the two examiners in the case had failed a proficiency test and was allowed to return to work after a second proficiency test, in which the examiner had to do an “in-depth consultation” with a supervisor.356United States v. Cloud, 576 F. Supp. 3d 827, 847 (E.D. Wash. 2021). The court found that it could not “in good conscience qualify [the examiner] as an expert with the requisite skill to perform fingerprint comparisons when her two most recent proficiency exams either contained an error or required a significant amount of assistance from her supervisor ” and further, the finding was bolstered by the portions of “testimony and performance reviews that touch on her skill, willingness to take correction, and confidence performing her work.”357Id. In the Willock case, the examiner’s “qualifications, proficiency and adherence to proper methods [we]re unknown.”358United States v. Willock, 696 F. Supp. 2d 536, 546 (D. Md. 2010).

Many courts traditionally focused on an expert’s credentials and self-professed expertise when conducting this inquiry into the qualifications of the witness.359See generally Brandon L. Garrett & Gregory Mitchell, The Proficiency of Experts, 166 U. Pa. L. Rev. 901 (2018) (arguing that objective evidence of proficiency, rather than credentials or self-professed expertise, should qualify experts). However, as one of the authors and Gregory Mitchell have argued, a careful inquiry into objective proficiency of the witness should be an integral part of the question whether a person should be qualified as an expert.360See id. at 940–49. Other courts have cited to the existence of proficiency testing as evidence of reliability, which as Garrett & Mitchell discuss, is not well supported. See, e.g., United States v. Johnson, No. (S5) 16 CR. 281 (PGG), 2019 U.S. Dist. LEXIS 39590, at *46 (S.D.N.Y. Mar. 11, 2019), aff’d, 861 F. App’x 483 (2d Cir. 2021) (“While these proficiency tests do not validate the underlying assumption of uniqueness upon which the AFTE theory rests, they do provide a mechanism by which to test examiners’ ability—employing the AFTE method—to accurately determine whether bullets and cartridge casings have been fired from a particular weapon.”). Indeed, such proficiency issues can raise larger red flags concerning the reliability of a crime lab unit and not just an individual examiner. Years before the Metropolitan Crime Lab had its accreditation revoked, as described in our introduction, a firearms examiner had failed a proficiency test after two colleagues had verified the work, implicating their own proficiency as well.361See Brandon L. Garrett, Autopsy of a Crime Lab: Exposing the Flaws in Forensics 94–95 (2021). Perhaps more careful attention to those proficiency tests could have prevented subsequent errors and systems failures of the firearms unit and the entire laboratory.

4.  As-Applied Challenges

Still additional challenges have focused on Rule 702(d), which used to provide that qualified expert testimony is admissible only when “the expert has reliably applied the principles and methods to the facts of the case.”362Fed. R. Evid. 702(d). These “as applied” challenges focus on the work that an expert does and not just whether they followed the right steps, but also whether their casework was actually supported by a valid method.363For a helpful explanation of what an as-applied challenge entails, see Edward J. Imwinkelried, The Admissibility of Scientific Evidence: Exploring the Significance of the Distinction Between Foundational Validity and Validity as Applied, 70 Syracuse L. Rev. 817, 832 (2020). Thus, some challenges have focused on, for example, the lack of documentation by firearms experts and the way they used their methods in a particular case.364For a case rejecting an as-applied challenge because the expert would not testify that a bullet came from a specific firearm, see United States v. Tucker, 18 CR 0119 (SJ), 2020 U.S. Dist. LEXIS 3055, at *3 (E.D.N.Y. Jan. 8, 2020). Some courts have found the presence of some documentation, such as “notes, worksheets, and photographs,” to be sufficient.365Ricks v. Pauch, No. 17-12784, 2020 U.S. Dist. LEXIS 50109, at *57 (E.D. Mich. Mar. 23, 2020); see also United States v. Harris, 502 F. Supp. 3d 28, 43 (D.D.C. 2020) (emphasizing that the expert shared “a description of his process and photo documentation.”); McNally v. State, 980 A.2d 364, 370 (Del. 2009) (finding cross-examination could adequately expose experts’ “lack of recollection” concerning application of methods).

III.  LESSONS FROM THE PATH OF FIREARMS EVIDENCE

The arc of judicial review of firearms evidence follows a pattern that is familiar in forensics more generally. Early judicial skepticism of a novel technique was overcome by claims of expertise relying on new technology (a microscope at the time), forceful claims to expertise by aggressive personalities (chiefly Major Goddard), some highly useful applications of the technique (to simply measure class characteristics), and steady accumulation of precedent. Then, as scientific critiques and evidence of error rates mounted, judges began to express some skepticism which has substantially increased in recent decades, producing a large body of law limiting firearms evidence in a range of ways.

That said, we underscore that other courts have not sought to introduce evidence concerning limitations of firearms evidence, much less imposed limitations. An appellate court in Missouri, for example, found no error in a judge’s refusal to allow defense attorneys to cross-examine the firearms expert concerning the findings of the NAS and PCAST reports.366State v. Mills, 623 S.W.3d 717, 729–31 (Mo. Ct. App. 2021), transfer denied (June 29, 2021) (“The trial court excluded the reports and their contents but did not deny defense counsel from asking questions about the flaws in toolmark and firearm examination as Appellant argues.”). Further, even in recent years, “many courts have continued to allow unfettered testimony from firearm examiners who have utilized the AFTE method.”367United States v. Romero-Lobato, 379 F. Supp. 3d 1111, 1117 (D. Nev. 2019) (citing David H. Kaye, Firearm-Mark Evidence: Looking Back and Looking Ahead, 68 Case W. Rsrv. L. Rev. 723, 734 (2018)).

The community of firearm examiners has mounted aggressive defenses of their work. In one memorable critique of how scientists and judges have raised questions concerning firearms comparison work, general counsel for the FBI wrote, “It is a lamentable day for science and the law when people in black robes attempt to substitute their opinions for those who wear white lab coats.”368Colonel (Ret.) Jim Agar, The Admissibility of Firearms and Toolmarks Expert Testimony in the Shadow of PCAST, 74 Baylor L. Rev. 93, 196 (2022) (“[C]ourts should recognize the long-standing reliability of the firearms identification discipline and the examiners who testify to that discipline.”). And yet it has been scientists—not judges—who have raised the deepest concerns about firearm examination. Statisticians, for example, criticize firearms comparison methods as having been “developed by insular communities of nonscientist practitioners” who, as a result, “did not incorporate effective statistical methods.”369William A. Tobin, H. David Sheets & Clifford Spiegelman, Absence of Statistical and Scientific Ethos: The Common Denominator in Deficient Forensic Practices, 4 Stats. & Pub. Pol’y 1, 1 (2017). As one litigator colorfully wrote in a Daubert brief, “Astrologers believe in the legitimacy of astrology. . . . And toolmark analysts believe in the reliability of firearms identification; their livelihoods depend on it.”370United States v. Cloud, 576 F. Supp. 3d 827, 844 (E.D. Wash. 2021).

The response to these scientific critiques has been to call them “flawed”371See Agar, supra note 368, at 166 (“Accreditation, widespread proficiency testing, the success of ATF’s NIBIN database, the Commerce Department’s recognition of firearms identification, and the reliance of the U.S. government on firearms identification to investigate and solve the assassination of a U.S. president serve as cornerstones for the ‘general acceptance’ of the firearms identification discipline.”). and double down on the claim that error rates are extraordinarily low. The FBI, for example, asserted in a 2022 case that there is an error rate of “1%.”372FBI, FBI Laboratory Response to Declaration Regarding Firearms and Toolmark Error Rates Filed in Illinois v. Winfield, May 3, 2022, at 3 (on file with authors). Federal prosecutors have repeatedly argued that “[f]irearms and toolmark identification meets all the Daubert criteria. Accordingly, there is no scientific or legal basis to exclude this evidence or even limit it.”373Gov’t’s Response to Defendant’s Motion in Limine to Exclude Ballistics Evidence, or Alternatively, for a Daubert Hearing at 23, United States v. Hunt, No. 5:19-cr-00073-R, 2020 WL 3549386 (W.D. Okla. April 27, 2020). Indeed, then-Attorney General Loretta Lynch more broadly responded to the PCAST report, upon its release, as not affecting the work of the Department of Justice: “We remain confident that, when used properly, forensic science evidence helps juries identify the guilty and clear the innocent. . . . While we appreciate their contribution to the field of scientific inquiry, the department will not be adopting the recommendations related to the admissibility of forensic science evidence.”374Gary Fields, White House Advisory Council Report Is Critical of Forensics Used in Criminal Trials, Wall St. J. (Sept. 20, 2016, 4:25 PM), https://www.wsj.com/articles/white-house-advisory-council-releases-report-critical-of-forensics-used-in-criminal-trials-1474394743 [https://perma.cc/XA3L-XHXE].

Some reactions in the field have been less defensive. Apparently in response to criticism by Judge Edelman, AFTE has opened its publications to outside viewing—one judge “applauds the publication’s changes and encourages AFTE and similar organizations to continue to open their publications up for criticism and review from the larger scientific community if they wish to meet Daubert’s rigorous standard.”375Cloud, 576 F. Supp. 3d at 842. However, the judge nevertheless found that the quality of the studies did not provide strong support for admissibility under Daubert.376Id.

One response by judges has been, as described, to limit the verbal formulations that firearms experts use when reaching conclusions. There are reasons to doubt that this compromise solution has been effective in communicating to jurors the limitations of firearms evidence. Two of us collaborated on a mock jury study examining how laypersons evaluate different firearms expert conclusions.377See Garrett et al., supra note 8. None of the limitations on firearms testimony adopted by courts, such as reasonable scientific certainty or more likely than not, had any impact on conviction rates except for the most far-reaching language, imposed in Tibbs, that barred any conclusion linking the firearms in question but rather permitting only a statement that a firearm cannot be excluded.378Id.

To be sure, the more recent rulings that permit only testimony concerning class characteristics go further than ruling out any language of inclusion. They limit the expert to testimony concerning objective measurements (for example, the width of the cartridge or bullet) and prevent more speculative testimony concerning probabilities that something came from a particular firearm. These rulings return firearms comparison to its roots: measuring objects. This can be useful and provide valuable information.

We have not seen judges take the approach to reliability, which is codified in Rule 702, that PCAST did, for example, insisting that “[t]he only way to establish the scientific validity and degree of reliability of a subjective forensic feature-comparison method—that is, one involving significant human judgment—is to test it empirically by seeing how often examiners actually get the right answer.”379An Addendum to the PCAST Report on Forensic Science in Criminal Courts 1 (Jan. 6, 2017).

In fact, some judges have expressly rejected this approach, stating that PCAST’s requirement of empirical study “goes beyond what is required by Rule 702.”380United States v. Harris, 502 F. Supp. 3d 28, 38 (D.D.C. 2020); see also United States v. Hunt, 464 F. Supp. 3d 1252, 1258 (W.D. Okla. 2020) (“[T]he Court declines Defendant’s invitation to restrict judicial review to techniques tested through black-box studies.”). However, there are strong reasons to think that jurors will benefit from more information regarding error rates and the reliability of the firearms comparison method, just as PCAST recommends and as mock jury experts have found productive—even just the bare acknowledgement that errors occur can impact jurors who assume that these experts are infallible unless told otherwise.381Brandon Garrett & Gregory Mitchell, How Jurors Evaluate Fingerprint Evidence: The Relative Importance of Match Language, Method Information, and Error Acknowledgment, 10 J. Empirical Legal Stud. 484, 503 (2013).

We note that this guidance extends not just to black box-type studies of the method, but also proficiency testing and other assessments of how well experts do their work in case-work settings, as well as blind testing, in which they do not know that they are being tested. Given how cognitive biases can impact the work of examiners in forensic settings, the evidence from black box studies may substantially underestimate error rates in actual casework.382See generally, e.g., Glinda S. Cooper & Vanessa Meterko, Cognitive Bias Research in Forensic Science: A Systematic Review, 297 Forensic Sci. Int’l 35 (2019). Moreover, jurors are extremely receptive to such information as well.383See generally, e.g., Gregory Mitchell & Brandon L. Garrett, The Impact of Proficiency Testing Information and Error Aversions on the Weight Given to Fingerprint Evidence, 37 Behav. Sci. & L. 195 (2019).

Nor have we seen judges take the approach of the American Statistical Association, which would require examiners to affirmatively state that there is no statistical basis for any probabilistic conclusion in their field.384ASA Report, supra note 345, at 4–5. Judges have, perhaps understandably, been far more comfortable with limiting conclusion language of experts than affirmatively requiring experts to explain limitations of their methods.

The 2023 amendments to Federal Rule of Evidence 702 encourage judges to more carefully consider that the proponent of an expert bears the burden to show that the various reliability requirements are met as well as that the opinions that the expert formed are reliably supported by the application of the methods to the data.385Committee on Rules of Practice and Procedure, June 7, 2022 Meeting 891–93, https://www.uscourts.gov/sites/default/files/2022-06_standing_committee_agenda_book_final.pdf [https://perma.cc/B8RW-YKCN]. That rule change, while reflecting prior law and not intended to change the substance of Rule 702, highlights the importance of judicial gatekeeping regarding the evidence that the proponent of the expert has that the work done, as well as the opinions reached, were grounded in reliable interpretation of data. The amendment supports the approach that we recommend: simply put, the exclusion of methods that are not demonstrated to be reliable. At a minimum, experts should also, as the American Statistical Association states, disclose all of the known limitations of their work.

Despite mounting scientific concerns and a limited response to the problem of firearms testimony by the Department of Justice,386We also note proposed standards from a different group that are in progress and largely restate the AFTE identification-based approach. See Firearms & Toolmarks Subcommittee, Standards: At an SDO for Further Development & Publication, Nat’l Inst. of Standards & Tech. (Mar. 1, 2022), https://www.nist.gov/osac/firearms-toolmarks-subcommittee [https://perma.cc/6ZWD-R7ED]. there has been a substantial federal investment in increasing the use of firearms comparison work. The federal database, the National Integrated Ballistic Information Network (“NIBIN”), has been supported by extensive federal grants, including regarding the expensive imaging equipment used on firearms evidence, to enter it into the database. Interestingly, the algorithms used to search that database remain a black box—the federal government has sponsored research on increasing the speed and efficiency of searches but not on how reliable “hits” are using the database.387Garrett, supra note 361, at 188. See generally William King, William Wells, Charles Katz, Edward Maguire & James Frank, Opening the Black Box of NIBIN: A Descriptive Process and Outcome Evaluation of the Use of NIBIN and Its Effects on Criminal Investigations (Oct. 2013), https://www.ojp.gov/pdffiles1/nij/grants/243977.pdf [https://perma.cc/2RJQ-5MGT].

Technology may eventually supply reliable means to provide quantitative information about the probability that a bullet or shell casing came from a particular firearm. Statistical approaches to this problem are under development, and one has been piloted by researchers with some promising initial results.388See CSAFE Develops New Bullet Matching Technology (Aug. 29, 2017), https://forensicstats.org/news-posts/csafe-develops-new-bullet-matching-technology [https://perma.cc/TD4H-QBZC]; Alicia Carriquiry, Heiki Hofmann, Xiao Hui Tai & Susan VanderPlas, Machine Learning in Forensic Applications, 16 Significance 29, 30–35 (2019). It may be that this is a scientific challenge that can be met. But for many decades, courts were willing to allow examiners to claim expertise that they lacked, based on assertions of experience, training, and proficiency that were not tested. Fortunately, now that those assertions have been minimally tested, some courts are stepping back to assess whether this expertise should be permitted. It is an object lesson in the acceptance and use of expert evidence in criminal courts, however, that has taken over a century for that shift to occur.

We end by emphasizing two other points. In this Article, we have focused on firearms comparison work, but it is only one specialty in the area of forensic toolmark comparison. It is among the most commonly used and has attracted sustained scientific and judicial attention, but as David Kaye and colleagues have importantly pointed out, “there is less research into the accuracy of associating impressions from tools such as screwdrivers, crowbars, knives, and even fingernails.”389Yale Law School Forensic Science Standards Practicum, Toolmark-Comparison Testimony: A Report to the Texas Forensic Science Commission 10 (2022) (“There are fewer limiting opinions involving source attribution to other tools, probably because fewer of these examinations are performed, and fewer reports bubble up to the courts.”). There is every reason to think that those other types of toolmark comparison raise similar or far larger reliability concerns.

Further, in this Article we have focused on criminal cases that proceed to a trial and evidentiary rulings at trial and on appeal. Yet, courts often do not have a Daubert hearing or issue written rulings regarding expert evidence questions.390United States v. Lee, 19-cr-641, 2022 U.S. Dist. LEXIS 150054, at *7 (N.D. Ill. Aug. 22, 2022) (“[S]ince the issuance of the NRC and PCAST reports, courts unanimously continue to allow firearms identification testimony.”). There have been high-profile wrongful convictions in cases involving firearms evidence, like that of Curtis Flowers who had six criminal trials, and no reported decisions discussing the firearms evidence involved.391See generally Jiaxin Zhu, Liangcheng Yi, Wenqian Ma, Ziyue Zhu & Guillem Esquius, The Reliability of Forensic Evidence: The Case of Curtis Flowers, Cornell U.L. Sch. Soc. Sci. & L., https://courses2.cit.cornell.edu/sociallaw/FlowersCase/forensicevidence.html [https://web.archive.org/web/20231014224452/https://courses2.cit.cornell.edu/sociallaw/FlowersCase/forensicevidence.html]. In a very interesting 2020 case, a judge found it appropriate for an exonerated person to introduce experts to show that the firearms evidence should have been exculpatory at the time of trial.392See generally Ricks v. Pauch, No. 17-12784, 2020 U.S. Dist. LEXIS 50109 *50 (E.D. Mich. Mar. 23, 2020) (denying defendant’s motion to strike plaintiff’s firearms experts). And most criminal cases are not tried. Lawyers may plea bargain cases based in part on the perceived power of a firearms comparison. Indeed, courts have regularly rejected application of Daubert reliability standards in other pretrial contexts, such as an application for probable cause relying on a firearms comparison.393See United States v. Rhodes, No. 3:19-CR-00333-IM, 2022 U.S. Dist. LEXIS 77231, at *16 (D. Or. Apr. 28, 2022) (“[P]robable cause in the context of a warrant is not subject to the Daubert standard.”). Further, laboratory audits have occurred based on revelations regarding errors in firearms work, which have not generated any written opinions in court, but which highlight the importance of forensic science commissions and other bodies tasked with investigating quality control failures in crime laboratories.394See generally, e.g., Tex. Forensic Sci. Comm’n, Final Report for Complaint Filed By Attorney Frank Blazek Regarding Firearm/Toolmark Analysis Performed At the Southwestern Institute of Forensic Science (April 2016), https://www.txcourts.gov/media/1440859/14-08-final-report-blazek-complaint-for-joshua-ragston-swifs-firearm-toolmark-analysis-20160419.pdf [https://perma.cc/EV5X-PC3M]; Justin Fenton, ‘Serious Questions’ Raised By Reports On Problems Inside Baltimore Police Crime Lab, Councilman Says, Baltimore Sun (Aug. 16, 2021, 2:18 PM), https://www.baltimoresun.com/news/crime/bs-md-ci-cr-crime-lab-folo-20210816-u6sbc72o25gjvfqeex4mfp2kvi-story.html [https://perma.cc/2VT4-8XX7]; Michigan State Police Forensic Science Division, Audit of the Detroit Police Department Forensic Services Laboratory Firearms Unit (2008). Thus, while there may be increasingly careful judicial review of firearms expertise in trial settings, much of the use of forensic evidence may remain largely unreviewed by judges.

CONCLUSION

We do not know how often people have been wrongly convicted based on erroneous firearms comparison conclusions. But we do know of people convicted based on firearms evidence testimony who have since been exonerated. For example, on January 16, 2019, Patrick Pursley was exonerated, in part because “evidence in 1993 was scant by today’s standards, and when you start with scant evidence you’re not in a good position to reevaluate it years later.”395Patrick Pursley, Other Murder Exonerations with False or Misleading Forensic Evidence, Nat’l Registry of Exonerations (last updated Feb. 27, 2022), https://www.law.umich.edu/special/exoneration/Pages/casedetail.aspx?caseid=5487 [https://perma.cc/E932-ACZR]. In that case, the judge found that defense experts demonstrated conclusively that the cartridge cases in question were not fired by the gun attributed to Pursley.396Id.

We have described how over the past one-hundred-plus years, judges’ initial skepticism of early firearms experts transformed into growing judicial acceptance, in large part because confident experts displayed new terminology, techniques, and technology like the comparison microscope. The result was—and still remains—“an overwhelming acceptance in the United States and worldwide of firearm identification methodology.”397United States v. Chavez, No. 15-CR-00285-LHK-1, 2021 U.S. Dist. LEXIS 237830, at *16–17 (N.D. Cal. Dec. 13, 2021). But despite a mountain of long-standing precedent, judicial acceptance of this testimony has eroded in recent years. After many decades of rote acceptance of the assumptions underlying the methodology, judicial interest in firearms expert evidence has exploded. Over half of the judicial rulings that we identified have occurred since 2009, the year that the NAS issued its pathbreaking report. Dozens of opinions limit testimony of firearms experts in increasingly stringent ways.

This sea change has occurred because of the work of lawyers, judges, and particularly scientists, who have played a key role in generating a new body of precedent. Scientists have demanded studies to examine questions of reliability, and they have exposed how the resulting studies uncovered deep concerns regarding error rates in firearms analysis. Firearms experts may have testified with confidence in the past. But today, they increasingly face defense experts who turn the microscope to the scientific flaws underlying firearms identification. In turn, judges have increasingly engaged closely with scientific research, error rate studies, and defense expert witnesses.

The Daubert revolution did not result in an immediate shift in how judges reviewed firearms evidence, but over time, judges have begun to grapple with the reliability standards. The scientific community continues to inform that work with detailed critiques. In turn, defense lawyers have launched more precise challenges that have shaped precedent.

The December 2023 revisions to Rule 702, designed to address both the burden to show that an expert is reliable and the manner in which experts reach and express conclusions, will solidify the focus—sharpened in firearms evidence rulings—on both of those important aspects of the judicial gatekeeping role. The resulting body of law has already reshaped how firearms evidence is received in criminal cases, and it provides important lessons regarding the slow, but perhaps steady, reception of science in our precedent-bound halls of justice.

APPENDIX

Appendix A.  Judicial Rulings Limiting Firearms Evidence, 2005–2022
CitationLimitation on Testimony
United States v. Felix, No. CR 2020-0002, 2022 U.S. Dist. LEXIS 213513 (D.V.I. Nov. 28, 2022)Limiting testimony to conclusions regarding class characteristics and whether individual toolmarkings were “consistent”
United States v. Stevenson, No. CR-21-275-RAW, 2022 U.S. Dist. LEXIS 170457 (E.D. Okla. Sept. 21, 2022)Limiting expert to “reasonable degree of ballistic certainty”
Winfield v. Riley, No. 09-1877, 2021 U.S. Dist. LEXIS 85908 (E.D. La. 2021)Limiting expert to “more likely than not” conclusion
United States v. Adams, 444 F. Supp. 3d 1248 (Or. 2020)Observational evidence permitted but no methods of conclusions relating to whether casings “matched” to be admitted
People v. Ross, 129 N.Y.S.3d 629 (Sup. Ct. 2020)Ruling that “qualitative opinions” can only be offered on the significance of “class characteristics”
United States v. Hunt, 464 F.Supp.3d 1252 (W.D. Okla. 2020)Permitting “reasonable degree of ballistic certainty”
State v. Raynor, 254 A.3d 874 (Conn. 2020)Permitting “more likely than not” testimony
United States v. Harris, 502 F. Supp. 3d 28 (D.D.C. 2020)Instructed expert to abide by DOJ limitations, including not using terms like “match” and not claiming to exclude all firearms in the world
Williams v. United States, 210 A.3d 734 (D.C. 2019)Finding error to permit expert to testify that there was not “any doubt” in conclusion
State v. Gibbs, 2019 Del. Super. LEXIS 639 (Del. Sup. Ct. 2019)May not testify to a “match” with any degree of certainty, and may not testify to a “reasonable degree” or “practical impossibility”
United States v. Tibbs, 2019 D.C. Super LEXIS 9 (D.C. Super. 2019)Limiting testimony to “the recovered firearm [that] cannot be excluded as the source of the cartridge casing found on the scene of the alleged shooting”
United States v. Davis, 2019 U.S. Dist. LEXIS 155037 (W.D. Va. 2019)Preventing testimony to any form of “a match”
United States v. Shipp, 422 F.Supp.3d 762 (E.D.N.Y. 2019)Preventing testimony “to any degree of certainty”
United States v. Medley, No. PWG 17-242 (D. Md. April 24, 2018)Permitting “consistent with” but no opinion fired by same gun
State v. Terrell, 2019 Conn. Super. LEXIS 827 (Conn. 2019)Prohibiting testimony regarding likelihood so remote as to be practical impossibility
United States v. Simmons, 2018 U.S. Dist. LEXIS 18606 (E.D. Va. 2018)Limiting to “a reasonable degree of ballistic . . . certainty”
United States v. White, 2018 U.S. Dist. LEXIS 163258 (S.D.N.Y. 2018)Holding that expert may not provide any degree of certainty unless pressed on cross-examination and may then present “personal belief”
State v. Jaquwan Burton, Superior Court, No. CR14-0150831 (Conn. Super. Ct. Feb. 1, 2017)Permitting “consistent with” but no opinion that it was fired by same gun
Missouri v. Goodwin-Bey, No. 1531-CR00555-01 (Mo. Cir. Ct. Dec. 16, 2016)Limiting to “the recovered firearm [that] cannot be excluded as the source of the cartridge casing found on the scene of the alleged shooting”
Gardner v. United States, 140 A.3d 1172 (D.C. 2016)Error to admit “unqualified” testimony with “100% certainty”
United States v. Cazares, 788 F.3d 956 (9th Cir. 2015)Limiting to “reasonable degree of scientific certainty”
United States v. Black, 2015 U.S. Dist. LEXIS 195072 (D. Minn. 2015)Limiting to “reasonable degree of ballistics certainty” and barring “certain” or “100%” conclusions
United States v. Ashburn, 88 F. Supp. 3d 239 (E.D.N.Y. 2015)Limiting to “reasonable degree of ballistics certainty” and precluding “certain” and “100%” sure statements
United States v. McCluskey, 2013 U.S. Dist. LEXIS 103723 (D.N.M. 2013)Limiting testimony to “practical certainty” or “practical impossibility”
United States v. Mouzone, 687 F.3d 207 (4th Cir. 2012)Approving trial ruling limiting any expression of certainty
United States v. Love, No. 2:09-cr-20317-JPM (W.D. Tenn. Feb. 8, 2011)Barring testimony of “practical” or “absolute” certainty
Commonwealth v. Pytou Heang, 942 N.E.2d 927 (Mass. 2011)Limiting to “reasonable degree of ballistics certainty”
United States v. Cerna, 2010 U.S. Dist. LEXIS 144424 (N.D. Cal. 2010)Limiting to “reasonable degree of ballistics certainty”
United States v. Willock, 696 F. Supp. 2d 536 (D. Md. 2010)“[A] complete restriction on the characterization of certainty” and precluding “practical impossibility” conclusion
United States v. Taylor, 663 F. Supp. 2d 1170 (D.N.M. 2009)Limiting to “reasonable degree of scientific certainty”
United States. v. Glynn, 578 F. Supp. 2d 567 (S.D.N.Y. 2008)Limiting to “more likely than not”
United States v. Diaz, 2007 U.S. Dist. LEXIS 13152 (N.D. Cal. 2007)Limiting to “reasonable degree of certainty in the ballistics field” and no testimony “to the exclusion of all other firearms in the world.”
United States v. Monteiro, 407 F. Supp. 2d 351 (D. Mass. 2006)Limiting to “reasonable degree of ballistic certainty”
Commonwealth v. Meeks, 2006 Mass. Super. LEXIS 474 (Mass. Super. Ct. 2006)Requiring examiner to present “detailed reasons” for rulings
United States v. Green, 405 F. Supp. 2d 104 (D. Mass. 2005)Barring “to the exclusion of all other guns” language
97 S. Cal. L. Rev. 101

Download

* Neil Williams, Jr. Professor of Law, Duke University School of Law, Faculty Director, Wilson Center for Science and Justice. Many thanks to Anthony Braga, Mugambi Jouet, Daniel Klerman, Charles Loeffler, Thomas D. Lyon, Aurelie Ouss, Danibeth Richey, Greg Ridgeway, D. Daniel Sokol, and the participants at workshops at University of Southern California Gould School of Law, a Center for Statistics and Applications in Forensic Evidence webinar, and the Department of Criminology, University of Pennsylvania for their feedback on earlier drafts, to Stacy Renfro for feedback on the firearms case law database, to Richard Gutierrez for helpful comments, and to Hannah Bloom, Erodita Herrera, Megan Mallonee, Linda Wang, and Grace Yau for their research assistance. This work was funded (or partially funded) by the Center for Statistics and Applications in Forensic Evidence (CSAFE) through Cooperative Agreements 70NANB15H176 and 70NANB20H019 between NIST and Iowa State University, which includes activities carried out at Duke University and University of California, Irvine.

† J.D., Duke University School of Law.

‡ Visiting Research Professor of Law, University of Southern California Gould School of Law. Professor of Psychology and Criminology, University of California, Irvine.

From Presentation to Presence: Immersive Virtual Environments and Unfair Prejudice in the Courtroom – Note by Khirin Bunker

From Volume 92, Number 2 (January 2019)
DOWNLOAD PDF


 

FROM PRESENTATION TO PRESENCE: IMMERSIVE VIRTUAL ENVIRONMENTS AND UNFAIR PREJUDICE IN THE COURTROOM

What if you could transport your jury from a courtroom to the scene of a catastrophic event? . . . Imagine how much more empathy you would feel for the victim of a catastrophic collision if you were to experience the tragedy first-hand.[1]

Introduction

In the courtroom environment, oral presentations are becoming increasingly supplemented and replaced by advancing digital technologies that provide legal practitioners with effective demonstrative capabilities.[2] Improvements in the field of virtual reality (VR) are facilitating the creation of immersive environments in which a user’s senses and perceptions of the physical world can be completely replaced with virtual renderings.[3] As courts, lawyers, and experts continue to grapple with evidentiary questions of admissibility posed by evolving technologies in the field of computer-generated evidence (CGE),[4] issues posed by the introduction of immersive virtual environments (IVEs) into the courtroom have, until recently, remained a largely theoretical discussion.

Though the widespread use of IVEs at trial has not yet occurred, research into the practical applications of these VR technologies in the courtroom is ongoing,[5] with several studies having successfully integrated IVEs into mock scenarios. For example, in 2002, the Courtroom 21 Project (run by William & Mary Law School and the National Center for State Courts) hosted a lab trial in which a witness used an IVE.[6] The issue in the case was whether a patient’s death was the result of the design of a cholesterolremoving stent or a surgeon’s error in implanting it upside down.[7]

During the mock trial, a key defense witness who was present during the surgery donned a VR headset, which recreated the operating room, and then projected to the jury her view of the operation on a large screen as she reenacted her role in the surgery. The demonstration significantly reduced the credibility of the witness when it revealed that she could not possibly have seen the doctor’s hands or wrists.[8]

In another experiment, Swiss researchers successfully used an Oculus Rift headset and Unity 3D software to render an IVE that made it possible for a viewer to assess how close bullets came to severely injuring a victim during a shooting.[9] Using a laser scan of the crime scene, footage taken from an onlooking security camera, and the final position of the projectiles, researchers were able to reconstruct the scene of the shooting to enable viewers to review the bullet trajectories, visibility, speed, and distance.[10]

Similarly, the Bavarian State criminal office, which currently handles the prosecution of Nazi war criminals tied to the Holocaust, applied laser scanning technology to develop a VR model of the Auschwitz concentration camp.[11] The model was recently adapted into an IVE for future use at trial, allowing jurors to examine the camp from almost any point of view.[12]

As research continues and new applications of IVE technology have been investigated, the use of VR technology is becoming increasingly mainstream and costeffective,[13] making it more practical to use an IVE in the courtroom. As such, early adapters in civil practice have announced plans to use IVEs at trial,[14] while litigation support providers are beginning to advertise VR development services.[15] Rising use of laser imaging software and body cameras among law enforcement departments, with the capacity to be converted into an IVE format for use at trial,[16] also has significant potential to facilitate the rapid expansion of these technologies in criminal proceedings.

From the standpoint of a legal practitioner, the potential value in applications of IVE use at trial are numerous. As a form of evidence, IVEs have the potential to redefine the way in which litigators can recreate crime and accident scenes for the jury.[17] Rather than having a jury watch a video rendering or review images after-the-fact, an IVE could allow jury members to witness an event firsthand—from any specific moment, angle, or viewpoint.[18] As a demonstrative technology, an IVE can be easily adapted to depict eyewitness and expert testimony, explain highly technical concepts, or transport users into an interactive environment in any given scenario.[19]

While some commentators have welcomed the onset of IVEs into the courtroom as a natural progression and the next step in technological development of visual media,[20] others have argued that IVEs are fundamentally different from prior forms of evidence and warrant heightened caution due to potential prejudicial effects on juries.[21] This Note supports the latter position and, drawing on psychological research, ultimately argues for revisions to be made in the admission of IVEs as demonstrative evidence.

Part I of this Note defines and distinguishes IVEs from other forms of VR and CGE. Part II compares the treatment of substantive and demonstrative evidence under the Federal Rules of Evidence and discusses the relevant evidentiary rules for the use of an IVE as an illustrative aid. Part III outlines applicable psychological and cognitive research and potential prejudicial effects on juries stemming from the employment of IVEs in a trial setting under the current rules. Part IV examines several cases in which computer-generated animations were subjected to lower evidentiary standards and raises further concerns in applying the current rules to an IVE. Part V explains the need for revisions to the procedures for admitting an IVE as demonstrative evidence and concludes by recommending new procedures which should be implemented prior to the proliferation of IVEs in the courtroom.

I.  Distinguishing Immersive Virtual Environments

The term “virtual reality” is used in many contexts, and it is important to note the distinctions between VR technologies capable of facilitating IVEs, which are the subject of this Note, and other mediums for virtual environment (VE) interaction and display. Computergenerated VEs can be roughly grouped into three broad categories based on the level of user immersion:[22] non-immersive (desktop), semi-immersive, and immersive virtual environments.[23]


Non-immersive systems, which include Fish Tank and Desktop VR, are monitor-based VR systems where users engage with the VE through a basic desktop display using stereoscopic lenses or an inherent autostereoscopic feature.[24] These kinds of displays do not necessitate that the user wear a VR headset or glasses and typically do not surround the user visually.[25] Likewise, semi-immersive systems have similar technologies but use large screen monitors, large screen projector systems, or multiple television projection systems that increase the users field of view, thereby increasing the level of immersion.[26]

Separate from these categories are mixed-reality, or augmented reality (AR), technologies that combine physical and virtual objects and align them with the real-world environment.[27] AR environments create a local virtuality, which is mapped onto the physical environment around the user, rather than completely replacing the surrounding environment with a virtual one.[28]

An IVE, by contrast, “perceptually surrounds the user.”[29] This is accomplished with a combination of three-dimensional computer graphics, high-resolution stereoscopic projections, and motion tracking technologies that continually render virtual scenes to match the movements and viewpoint of the user.[30] Through the use of a head-mounted display (HMD),[31] sensory information from the physical world is replaced with the perception of a computergenerated, three-dimensional world in which the user is free to move and explore.[32] In the context of an IVE, VR can therefore be understood to mean “a computer-generated display that allows or compels the user (or users) to have a feeling of being present in an environment other than the one they are actually in and to interact with that environment.”[33]

The resulting sense of presence felt by the user is described as a function of an individual’s psychology,[34] representing the degree to which that user experiences a conscious presence in the virtual setting.[35] This effect on a user’s state of consciousness has been attributed to the unique vividness and interactivity of an IVE,[36] which distinguishes IVEs from prior forms of CGE.[37] This sense of consciousness created by an IVE also forms the basis for psychological concerns about leading to potential risks of unfair prejudice in using an IVE at trial.[38] However, prior to further discussion of the unique psychological issues raised by IVEs, it is important to understand how an IVE offered for use at trial would be evaluated under the current rules of evidence.

II.  Immersive Virtual Environments and the Federal Rules of Evidence

As previously noted, at trial, an IVE could be applied by courtroom attorneys for presentations to the jury that recreate crime and accident scenes, illustrate highly technical procedures, and demonstrate eyewitness or expert testimony. The most practical method of IVE application in the courtroom would be jurors donning individual HMDs during the course of, or simultaneous with, live testimony.

Though the use of IVEs in the courtroom remains largely unprecedented, the process for addressing the question of an IVE’s use at trial will likely be similar to that used for other forms of visual media.[39] At present, the Federal Rules of Evidence fail to make specific reference to any form of CGE, and therefore do not address the concept of an IVE.[40] Yet, in the absence of legislative revision, it is fair to assume that the admissibility of IVE evidence will be evaluated under existing basic evidentiary rules[41] as well as accompanying general principles which have developed among the courts for determining the admissibility of other forms of CGE.[42]

As a form of visual media, an IVE would need to be classified as either demonstrative—also called illustrative—or substantive evidence.[43] In the realm of CGE, courts have generally labeled 3D renderings as either computer animations (typically treated as demonstrative evidence) or computer simulations (typically treated as substantive evidence).[44] This classification is critical in determining the applicable foundational requirements, which vary due to the differing purposes for which the evidence is introduced.[45]

Substantive evidence is offered by the proponent “to help establish a fact in issue.”[46] Thus, a computer-generated simulation created through the application of scientific principles would be considered to have independent evidentiary value and therefore be evaluated as substantive evidence.[47] If treated similarly, an IVE used to reconstruct the moment of a car accident, created through software that was programmed to analyze and draw conclusions from pre-existing data (such as calculations, eyewitness testimony, and so forth) would be considered substantive evidence.[48]

One of the primary hurdles facing an IVE entered as substantive evidence at trial would be in laying the foundation for its admission.[49] Because of these foundational challenges, the primary method for introducing an IVE as substantive evidence at trial would likely be in a form accompanying expert testimony.[50] This introduction could be done in several ways: as “part of the basis for expert opinion testimony, an illustrative aid to expert testimony, or a stand-alone exhibit introduced through the testimony of an expert involved in creating the IVE.”[51] As substantive evidence, a testifying expert could draw conclusions about the accident based on the IVE simulation, and it might be admitted as an exhibit that would be made available to the jury for review in deliberations.[52] Yet, as such, both the expert who prepared the IVE and the underlying scientific principles and data used in its construction would be subject to validation.[53]

Demonstrative evidence, in contrast, is defined as “physical evidence that one can see and inspect . . . and that, while of probative value and [usually] offered to clarify testimony, does not play a direct part in the incident in question.”[54] Meaning that, in theory, demonstrative evidence itself serves merely to illustrate the verbal testimony of a witness and should not independently hold any probative value to the case.[55] As such, visual aids introduced as demonstrative evidence are not typically allowed into jury deliberations and are not relied on as the basis for expert opinion.[56] Because visual aids offered as demonstrative evidence are not formally admitted as exhibits, courts treat this kind of evidence more leniently than substantive evidence when evaluating its use at trial.[57] An IVE presented as an illustrative aid to expert testimony, rather than as a basis for expert testimony or an independent exhibit, would therefore not be subject to the same level of scrutiny as substantive evidence.[58]

Despite these standards being significantly lowered, an IVE offered as demonstrative evidence would still need to meet basic evidentiary standards of relevancy, fairness, and authentication.[59] However, it is important to note that the extent to which these requirements would be enforced is a question of judicial discretion and ultimately rests with the presiding trial judge.[60]

The initial inquiry into an IVE, regardless of whether it was offered for demonstrative purposes, would determine whether it was relevant under Federal Rules 401 and 402. Rule 401 would require that the IVE have a “tendency to make a fact more or less probable than it would be without the evidence” and be “of consequence in determining the action.”[61] After a preliminary determination of relevancy, and absent any restrictions in Rule 402,[62] a demonstrative IVE would also need to be authenticated using the guidelines of Rule 901.[63]

Rule 901(a) states that to “satisfy the requirement of authenticating or identifying an item of evidence, the proponent must produce evidence sufficient to support a finding that the item is what the proponent claims it is.”[64] With respect to computer-generated animations used as demonstrative evidence, the animation must “fairly and accurately reflect the underlying oral testimony . . . aid the jury’s understanding” and be authenticated by a witness.[65] Thus, an animation used solely to illustrate witness testimony requires only that the witness testify that it was an accurate representation of the testimony and,[66] in the case of an expert witness, that it would help the jury to understand the expert’s theory or opinion.[67] Using the current method for computergenerated animations, a witness with personal knowledge of the event in question or an expert who had been made aware of the circumstances surrounding the event could simply testify that the IVE was a fair and accurate portrayal of the expert’s testimony.[68]

Importantly, some commentators have posited that, as a newer technology, the foundational requirements imposed on an IVE could be higher than those required for existing forms of illustrative aid.[69] This might necessitate that the proponent of an IVE meet some or all of the more difficult foundational hurdlesbriefly mentioned aboveregarding the use of scientific evidence.[70] As with other questions of admissibility, however, this determination would be made by the trial judge and the imposition of additional requirements, more akin to substantive evidence, should not be taken as a certainty.[71] Though the underlying data in an IVE offered as demonstrative evidence would undoubtedly be challenged by an opposing party, similar challenges were made in the context of computergenerated animations and were rejected by the courts even during the earliest stages of that technology’s introduction into the legal system.[72]

Regardless of the outcome of future methods used for authentication and despite a finding of relevance using Rules 401 and 402, an IVE could still be excluded by the trial judge under the balancing test of Rule 403.[73] Rule 403 states that “[t]he court may exclude relevant evidence if its probative value is substantially outweighed by a danger of one or more of the following: unfair prejudice, confusing the issues, misleading the jury, undue delay, wasting time, or needlessly presenting cumulative evidence.”[74] These broad standards set out by Rule 403 are a result of the high level of subjectivity required in making an admissibility determination, which essentially dictates a case-by-case analysis.[75] As such, decisions made by the trial judge pursuant to Rule 403 are largely exercises of discretion and are reviewed almost exclusively for abuse of discretion at the appellate level.[76] Although a trial judge might exclude an IVE for any of the above reasons listed under Rule 403, the distinct potential for unfair prejudice created by an IVE is the source of concern for much of the remaining discussion in this Note.

The Rule 403 advisory committee notes define unfair prejudice as “an undue tendency to suggest decision on an improper basis, commonly, though not necessarily, an emotional one.”[77] Broadly speaking, decisions to exclude a piece of evidence for unfair prejudice can be broken down into two primary categories: emotionalism and misuse of evidence.[78] Unfair prejudice caused by overreliance on emotion can be understood as evidence deemed to be “overly charged with appeal to this less rational side of human nature.”[79] Though the goal of Rule 403 is not to exclude all forms of evidence that elicit emotional response, the aim of the trial judge is to moderate the extent to which this response occurs. Aside from emotional concerns, unfair prejudice also results when evidence is misused by the jury after being deemed “admissible for one purpose (or against one party) but not another.”[80] The risk of misuse arises when there is a high likelihood “that the jury will mistakenly consider the evidence on a particular issue or against a particular party, even when properly instructed not to do so.”[81]

In either case, it is necessary for the judge to evaluate whether the probative value of the evidence is substantially outweighed by the risk of a juror’s reliance on an improper basis.[82] To do so, the judge must also take into consideration whether or not the risk can be remedied by issuing a limiting instruction.[83] In making determinations about admissibility, however, it is important for a judge to understand the unique psychological factors implicated by the use of an IVE. Without so doing, a judge may come to a decision which appears on the surface to be well-founded, but ultimately fails to consider the full extent of the risks posed by the use of an IVE. In the next Part, I will discuss several psychological and cognitive factors which should be measured when determining the admissibility of an IVE as demonstrative evidence.

III.  Potential Prejudicial Impacts of Immersive Virtual Environments on Jury Decisionmaking

A.  Designing Emotion in a Virtual Environment

As discussed in Part I, the element of presence in an IVE distinguishes this form of presentation from other forms of CGE. The concept of presence can be understood to manifest itself in a VE in three ways: via social presence, physical presence, and self presence. This Note is primarily concerned with the latter two.[84] Self presence has been defined as “a psychological state in which virtual (para-authentic or artificial) self/selves are experienced as the actual self in either sensory or nonsensory ways.”[85] Similarly, physical presence has been explained as “a psychological state in which virtual (para-authentic or artificial) physical objects are experienced as actual physical objects in either sensory or nonsensory ways.”[86] Reported experiences of both user self and physical presence in IVEs have led researchers to examine the ways in which IVEs influence user emotion, empathy, and embodiment, each of which will be addressed in turn below.

While research into the effects of IVEs on user emotion remains an active area for experimentation and debate,[87] initial studies have shown significant links between user presence in an IVE and stimulated emotion. One particular area of research has focused on the impact of emotional content in VEs and the relationship between user feelings of presence and actual user emotion.[88] The basic premise behind this type of research follows the logic that “if a dark and scary real-life environment elicits anxiety, so will a corresponding VE if the user experiences presence in it.”[89]

Following this theory, studies have been conducted involving mood induction procedures (MIPs), in which VEs have been intentionally designed to provoke specific emotional states.[90] For example, one such study presented participants with three different virtual park scenarios using an HMD with head tracking software and an accompanying joystick to facilitate movement.[91] The three park renderings shared the same virtual structure and objects (for example, trees, lamps, and so forth), but the developers manipulated the sound, music, shadows, lights, and textures with the purpose of inducing either anxiety or relaxation in users. The third park served as a neutral control that was not designed to induce any emotion.[92] Participants were assessed for emotional predisposition prior to the study, and they answered questionnaires regarding emotion and presence throughout the study.[93] The results showed significant variability in user happiness and sadness depending on which park the participant experienced.[94] The anxious park, which contained darker imagery and shadows, reduced user happiness and positive effects, while increasing feelings of sadness and anxiety.[95] In contrast, the relaxing park, which contained brighter imagery, increased user quietness and happiness, while reducing anger, sadness, anxiety, and negative effects.[96] The neutral park, however, did not elicit significant measurable changes.

Building on the same research, a more recent study exposed participants to different virtual park scenarios intentionally designed to elicit one of five specific affective states: joy; anger; boredom; anxiety; and sadness.[97] Effects on participants emotional reactions were measured through both physiological responses (monitoring electrodermal activity) and self-reporting. Based on these measures, researchers found they were able to induce the intended emotions in almost all cases and that they could elicit different emotional states by applying only slight changes to the lighting conditions or sounds in the VE.[98] Thus, these measures exhibit further support for the notion that VEs may be specifically designed to induce intended emotional states through various MIPs and alterations to the design elements in a virtual scenario.[99]

In addition to studies on inducing emotional states, others have examined the effects of IVEs on user empathy.[100] As previously noted, a core fundamental difference between traditional CGE and IVEs is in the form of presentation. Any time an image is rendered on a screen, there is a possibility that a viewer will interpret the image objectively because it appears without a human operator (who would be viewed as a subjective party).[101] Yet, in a traditional CGE display, the physical surroundings of the courtroom remain within the perspective of the viewer and the animation or simulation playing on the screen often retains a fixed camera viewpoint.[102] In contrast, through an IVE, the user can effectively take on the role of any specific actor or third-party observer in any given scenario.[103]

A recent study examining the influence of a user’s point of view on his or her assessment of vehicle speed and culpability in a computer animated car crash sequence demonstrates this effect.[104] Participants were presented with three separate animations of a two-car collision from different points of view: overhead (behind and above Car 1), internal (inside Car 1), and facing (looking directly at Car 1).[105] They were then asked to fill out a questionnaire which involved apportioning blame to either Car 1 or Car 2.[106] The study results demonstrated substantial differences in overall culpability assessments depending on the participant’s point of view, with participants apportioning 92% of the blame to Car 1 from the facing position, but only 43% from the overhead view and 34% from the internal view.[107] Though the study acknowledged limitations on ecological validity, the results were in line with Feigenson and Dunn’s hypothesis that small changes and manipulations to an observer’s point of view in a computer-generated animation may “have various legally significant effects.”[108]

In another study, participants were divided into 2 x 2 groups based on levels of immersion and user personality traits.[109] Participants then watched a documentary news series through VRcontent-based or flat-screen-based technologies, depending on the immersion group.[110] The study found that presence in the VE positively influenced both empathy and embodiment—meaning that users in a higher immersion setting were more likely to feel a sense of compassion for the subjects of the news story.[111] Importantly, the authors of the study urged that immersion in a VE should be recharacterized “as a cognitive dimension alongside consciousness, awareness, understanding, empathizing, embodying, and contextualizing” rather than as a strong stimulus for facilitating illusion.[112] In other words, instead of viewing IVE technology as an illustrative aid in storytelling, it should be viewed as a factor influencing user cognition in reasoning through a proposed narrative.[113]

Based on current findings in both areas of research and despite ongoing debate regarding specific limitations and interplay between these factors in a VE, the potential for an IVE to be purposefully designed to elicit user emotions and empathy appears to exist. While relying on emotion and empathy in our day-to-day decisionmaking can be an ecologically valid tool of assessment, in the courtrooman intentionally hermetically sealed universeit poses a distinct risk of unintended prejudicial effects. Murtha v. City of Hartford provides an example of how these potential effects might be implicated in the trial setting.[114] In 2006, Connecticut Police Officer Robert Murtha was acquitted on all charges relating to his shooting a suspect who was evading police in a stolen car.[115] During the pursuit, the car stalled in snow on the side of the road.[116] As Murtha left his cruiser and approached the car, the suspect attempted to reenter the road and speed off. Murtha fired multiple shots into the driver’s side window that injured the fleeing driver. Dashcam footage from another police cruiser positioned behind Murtha showed him chasing the vehicle and firing into the car as it sped off.[117]

At trial, Murtha argued that his use of deadly force was justified as an act of self-defense because, at the time, he believed that the car was headed towards him.[118] Murtha presented the jury with a hybrid of the dash cam footage and a computergenerated animation to illustrate his point of view.[119] As the driver begins to pull onto the road, the original video freezes and an interspliced animation rotates the field of view from the liveaction shot to a recreation of Murtha’s first-person perspective.[120] Comparing the original footage to the animation, there are some clear discrepancies: (1) the car re-enters the road at a sharper angle; (2) Murtha is placed partially within the path of the car and his gun is already drawn and extended; (3) as the car begins to drive off, Murtha moves slowly alongside the car while firing instead of running.[121] However, over the prosecutor’s objections as to the inaccuracy of the animation, the judge determined that the video was a fair and accurate depiction of Murtha’s recollection and issued a limiting instruction that the animation was not meant to depict a precise reenactment.[122]

In creating a computer-generated display, a designer’s decision to provide one viewpoint over another “can potentially alter which ‘character’ in an evidence presentation a viewer identifies with, or aligns themselves with.”[123] Through the animation in Murtha, the jury effectively took on the role of the officer in the shooting. Putting any discrepancies in the animation aside, placing the jurors in the shoes of the officer alone created the potential for unfair prejudice resulting from actorobserver bias. If the same animation in Murtha was presented in the form of an IVE, the additional factor of user presence would further complicate this potential. Based on the above studies, an IVE can be intentionally designed to elicit, or even unintentionally cause, a user to feel strong emotions, empathy, and overall self-alignment, which would significantly magnify the risk of unfair prejudice. Though these potential sources of prejudice may not ultimately have been grounds for reversal in Murtha,[124] they should be recognized as important factors when addressing the question of prejudicial effects in an IVE.

B.  Body Ownership Illusions

When an IVE user feels strongly about another person’s emotions or circumstances in a VE, this can translate into a cognitive feeling of embodiment.[125] Thus, in addition to increasing user emotion and empathy through presence, the virtual body experienced by the user can begin to feel like an analog of the user’s biological body generated through user cognition.[126] As a result, the user-tracking technologies used to facilitate an IVE uniquely involve the potential to produce body ownership illusions (“BOIs”).[127] BOIs are created when non-bodily objects (like a virtual projection or prosthetic limb) are experienced as part of the body through a perceived association with bodily sensations such as touch or movement.[128] The first experiment by Botvinick and Cohen introduced the concept of BOIs through a rubber hand illusion.[129] Participants in the original experiment had their hands concealed and a rubber hand with a similar posture was placed in front of them. An experimenter then stroked both the real and rubber hands simultaneously, causing the majority of participants to report feeling that the rubber hand was a part of their own body.[130] This phenomenon, termed the rubber hand illusion, was later shown to activate areas of the brain “associated with anxiety and interoceptive awareness” when “the fake limb is under threat and at a similar level as when the real hand is threatened.”[131] Thus, participants in one study reacted in anticipation of pain, empathic pain, and anxiety when experimenters occasionally threatened a rubber hand with a needle while participants were under the effects of a BOI.[132]

Subsequent experiments have also tested the extent to which certain multisensory factors are necessary to induce BOIs.[133] While the original experiment involved a visuotactile cue (where participants experienced a combination of visual stimulation and physical contact), further experiments have induced BOIs solely through visuomotor input.[134] Visuomotor stimulation involves participants performing active or passive movements while simultaneously seeing the artificial body (or body part) perform the same movements.[135] Most significantly, this phenomenon has been shown to occur in VEs.[136]

For example, in one study, experimenters outfitted participants with an HMD and a handtracking data glove and asked them to focus on the movement of a virtually projected right arm which moved synchronously with the actions of their real right arm, hand, and fingers.[137] The participants’ real right arm was located approximately twenty centimeters away from the virtual projection. Participants were then asked to use their left arm, which was not tracked or projected, to point to their right arm.[138] The participants largely tended to misidentify their real hand and instead identify the virtual hand, in some cases even after the virtual simulation had terminated.[139] The results were consistent with prior studies involving the rubber hand illusion and showed that the illusion of ownership could occur as a result of visuomotor synchrony in movements between the real and virtual hand.[140]

Additional studies of BOIs in VR have led to consistent findings that VEs can produce these effects when homogenous body parts are moved synchronously.[141] These studies have found BOIs resulting from the synchronous movement of virtual legs,[142] upper bodies,[143] and even full bodies.[144]

In an IVE, the occurrence of BOIs as a result of visuomotor stimulation has significant implications as a potential source of unfair prejudice. Beyond the concern that user emotion and empathy in an IVE might cause a juror to sympathize more with a party whose perspective he or she shares, BOIs introduce a separate issue: synchrony between a juror’s movements and those of an actor perceived in an IVE could cause the juror to temporarily feel as if he or she is that person. While some psychological studies have highlighted benefits of inducing BOIs through VR in the courtroom, for example in the potential for reducing racial biases,[145] the risk for unfair prejudice is also exceptionally high. From the standpoint of emotional prejudice, BOIs created through an IVE can both cause the viewer to feel anxious or threatened in a scenario[146] and ultimately to identify with the avatar.[147] For example, if the animation in Murtha were presented through an IVE (with jurors wearing an HMD and data gloves), the jurors could feel as if the car was coming towards their own bodies, eliciting fear or anxiety through an apprehension of contact. Moreover, this vivid and emotional experience could cause a juror to disregard conflicting pallid evidence in the case as to the car’s trajectory or the sequence of events and unduly rely on the IVE, despite its being used merely as a representation of the propounding party or witness’s theory of the case.

IV.  Problems with the Current Rules for Demonstrative Computer-Generated Evidence

A.  Case Studies

When subjecting jurors to an IVE, both presence and the phenomenon of BOIs create a unique potential for unfair prejudice. Even though IVEs are uniquely immersive and extremely vivid when introduced as demonstrative evidence, they could still remain subject to surprisingly low evidentiary standards. While the rules presented in Part II may at face value appear to be a significant burden for the proponent of an IVE, as stated previously, the characterization of an IVE as substantive or demonstrative and the broad discretion afforded to trial judges can significantly impact the extent to which the rules are used to allow the use of IVE at trial. The treatment of CGE in the following cases is illustrative of the more lenient approach applied in many jurisdictions when dealing with demonstrative evidence.[148]

In Commonwealth v. Serge, a defendant found guilty of first-degree murder for killing his wife appealed the State’s use of a computer-generated animation as demonstrative evidence.[149] The animationintroduced to illustrate the expert testimonies of a forensic pathologist and crime scene reconstructionistpurported to show the manner in which the defendant shot his wife.[150] Prior to admitting the animation, the trial court required that it be authenticated as both a fair and accurate depiction of the testimony and that any potentially inflammatory material be excluded.[151] The trial court also issued a lengthy jury instruction at trial cautioning that the animation was a demonstrative exhibit for the sole purpose of illustrating expert testimony and cautioned the jury not to “confuse art with reality.”[152] The defendant challenged the animation as unfairly prejudicial and improperly authenticated under Pennsylvania Rule of Evidence 901(a) given that the depictions were unsupported by the record or the accompanying expert opinions.[153] The Pennsylvania Supreme Court found both that the animation was a proper depiction of the witness testimony and that the limiting instruction and lack of dramatic elements in the animation were sufficient to eliminate any concerns over prejudice.[154] The court affirmed the admissibility of the animation and held that the animation properly satisfied the basic requirements of Pennsylvania Rules of Evidence 401, 402, 403, and 901.[155]

More recently, in a Utah case—State v. Pereaa defendant convicted of two counts of aggravated murder and two counts of attempted murder appealed his sentence, arguing, in part, that computer-generated animations, excluded by the district court, were sufficiently authenticated under Utah Rule of Evidence 901(a).[156] At trial, the defendant attempted to introduce two animations to visually represent the testimony of a crime scene reconstruction expert.[157] The expert testified that “although he did not personally create the animations, they ‘g[a]ve an indication of what [he] believe[d] may have happened,’” making it easier for the jury to understand his testimony.[158] The State objected for lack of foundation and on the grounds that the animations did not accurately represent the facts, because under the State’s theory there was only one shooter.[159] Reversing the ruling of the district court, the Utah Supreme Court held that despite a lack of knowledge about the creation of the animation on the part of the testifying expert, Rule 901 “does not require that the demonstrative evidence be uncontroversial, but only that it accurately represents what its proponent claims.”[160] The district court’s exclusion was an error because the crime scene reconstruction expert confirmed that the animations accurately represented his interpretation of the facts.

In both cases, the computer-generated animations were deemed relevant under Rules 401 and 402, properly authenticated under Rule 901, and passed the balancing test of Rule 403. However, in neither case were the proponents of the animations obligated to meet foundational requirements beyond an assertion that the animation “fairly and accurately” depicted the testimony of the witnesses—despite the fact that the animations were constructed solely using witness testimony about their memories of the event. Additionally, both courts found that the trial court’s issuance of limiting instructions to the jury was sufficient to combat any prejudicial effects. Under examination, the court’s analyses contain multiple flaws which would be further complicated if IVEs were at issue.

B.  Issues with the Court’s Analyses

First, in creating computer-generated representations of a witness’s testimony “[n]o matter how much evidence exists, there is never enough to fill in every detail necessary. . . . The expert (or the animator) must make assumptions to fill in the blanks.”[161] In Serge, like in Murtha, the animators took significant liberties in creating the animation.[162] By placing a knife next to the victim and dressing the defendant’s character in red plaid, the animators made decisions that were not necessarily supported by the physical evidence but were then authenticated by the accompanying witness’s memory or an expert’s theory as to what happened.[163]

Like an animation, the creation of an IVE inevitably involves choices by a designer regarding not only what is perceived, but also how it is perceived. Without proper safeguards or consideration, a party at trial could ostensibly introduce an IVE for demonstrative purposes which appeared to be sufficiently limited in emotional content to the eyes of the trial judge but was designed using MIPs to subtly influence jury attitudes towards a given scenario. For example, in arguing a self-defense claim, a party could ask designers of an IVE to select color palettes and illumination levels more likely to elicit fear and anxiety.[164] As explained in Part III, even subtle or indirect changes to factors such as lighting, point of view, level of interactivity, or synchrony of movement can have significant psychological implications for users of an IVE.[165] However, none of these factors are involved in the current analysis for demonstrative CGE in many jurisdictions.[166]

Second, it seems clear that in combatting highly vivid demonstrative evidence, “the opponent of the animation should be allowed [on cross-examination] to demonstrate to the jury that the . . . animation [is] based, at least partially, on assumptions and conjectures, and not on purely objective, scientific factual determinations.”[167] Yet, under the current standards for demonstrative CGE, many jurisdictions do not require the testifying witness to have personal knowledge regarding the creation of the animation.[168] In Perea, for example, the animation was admitted despite the accompanying witness possessing no information about the creation of the animation.[169] A similar decision by a trial judge to admit an IVE as demonstrative evidence, without an accompanying witness having knowledge about the decisions or assumptions made in creating the IVE, would likewise significantly disadvantage an opponent in combatting its highly vivid qualities through cross-examination.

Third, both courts relied heavily on jury instructions to moderate the potential prejudicial impacts of the animations on the jury.[170] Though the general rule is to assume that juries will abide by limiting instructions,[171] the Supreme Court has previously recognized that “there are some contexts in which the risk that the jury will not, or cannot, follow instructions is so great . . . that the practical and human limitations of the jury system cannot be ignored.”[172] Moreover, research in the field of social psychology has “repeatedly demonstrated that . . . limiting instructions are unsuccessful at controlling jurors’ cognitive processes.”[173] While this does not necessitate the presumption that all jury instructions are ineffective, it does call into question whether a jury subjected to the highly vivid and unique psychological effects of an IVE might have trouble following a judge’s directions as to the permissible and impermissible purposes for its use.

V.  Recommendations

In anticipation of the onset of IVEs in the courtroom, this Note proposes several changes to the current standards for admissibility, as well as judicial guidelines for best practice in moderating the prejudicial impacts of IVEs.

A.  Stricter Foundational Requirements

Though it would be impractical to develop a “one-size-fits-all” method in dealing with the numerous potential contexts and purposes for which an IVE might be offered as demonstrative evidence, uniformly increasing the foundational requirements for admitting demonstrative IVEs would help to combat some of the potential sources for prejudice.

In State v. Swinton, the Connecticut Supreme Court recognized the need for changes in the rules governing demonstrative evidence with regard to evolving computer technologies.[174] Addressing the binary distinction of the courts between computer animations and computer simulations, the court recognized that there are some kinds of evidence which do “not fall cleanly within either category.”[175] Though Swinton addressed the enhancement of photographs through Adobe Photoshop, the court’s discussion is particularly applicable in relation to an IVE.[176] The court found that “the difference between presenting evidence and creating evidence was blurred”[177] and endorsed a previously established general rule requiring that in all cases involving CGE there be “testimony by a person with some degree of computer expertise, who has sufficient knowledge to be examined and cross-examined about the functioning of the computer.”[178] In addition, the court went one step further in setting out factors with which the expert should be familiar and which could be weighed in determining the reliability of, and adherence to, procedural requirements.[179]

Adopting the court’s logic, this Note recommends that as a basic requirement, an expert who prepared the IVE should be present at the trial to testify regarding the expert’s qualifications and the underlying processes used to create the IVE. This would ensure that the opposing party has the opportunity to cross-examine the expert regarding the underlying data and assumptions used in its creation. In continued recognition of the differences between substantive and demonstrative evidence, this would not necessitate that the proponent satisfy all of the requirements for admitting scientific evidence under Rule 702 (and the Daubert or Frye tests).[180] However, this would at least afford the opposing party the opportunity to cross-examine someone with personal knowledge of the IVE technology and its creation.[181]

B.  Evaluating and Limiting Prejudicial Effects

While establishing an adequate foundation by requiring the presence of an informed expert works to combat some of the unfairness stemming from reliability and misuse of evidence under the current demonstrative standards, this alone is insufficient to curb the range of significant potentials for prejudice. In addition to raising the foundational requirements, there are several factors which should be considered by a judge in conducting the Rule 403 balancing test. In addressing the potential for juror’s unfair reliance on an IVE, consideration of the factors identified in Part IV—chiefly the role of presence and BOIs—should be a necessary predicate to admission. This would require judges to scrutinize not only the design factors in an IVE, but also the level of interactivity facilitated.

Interestingly, beyond mere consideration of such factors, it may also be possible for judges to take affirmative steps to impose limitations on an IVE which could help to mitigate juror overreliance. As this Note has repeatedly stated, the source of many of the potentials for prejudice created by IVEs is their unique vividness and interactivity, which produce feelings of presence and body ownership in the user.[182] Both psychologicalpresence research and BOI studies indicate that there may be ways to limit, reduce, or remove the feelings of presence and ownership in a VE.[183] Such phenomenon, termed as “breaks in presence” (BIPs),[184] occur when the user’s feelings of ownership or consciousness within the VE are disrupted by perceived virtual or real-world interferences.[185]

Under Rule 611(a), judges have broad authority to regulate the admission of demonstrative evidence.[186] As such, judges could potentially use BIPs to mitigate the prejudicial effects of an IVE. Multiple studies have concluded that BOIs occur in VEs only when the movements depicted are relatively synchronous.[187] Because of this, “[w]hen there is asynchrony the illusion does not occur.”[188] With this knowledge, a judge would have the option to instruct the proponent of an IVE to increase the latency (delay) between the movements of the juror and the avatar, thereby reducing the likelihood that a BOI would occur. In another study, examiners found that replacing a perceived limb with a virtual arrow indicator would similarly reduce the BOI phenomenon.[189] Thus, an alternative option might be to instruct the proponent to limit the realistic qualities of the avatar by replacing human features with indicators. Naturally, as further studies are completed and the concepts of presence and ownership in VEs become better understood, so too will the options available to judges in imposing limitations.

Conclusion

As was recognized by the drafters of the Federal Rules of Evidence, it is difficult to define bright line admissibility rules.[190] Despite these difficulties, it stands that the current treatment of demonstrative evidence in many jurisdictions does not properly accommodate IVEs. Though it may appear contrary to logic to think that an IVE could be treated like a chart or graph in the courtroom, under current standards this might very well become the case in some jurisdictions. This author agrees that “every new development is eligible for a first day in court;[191] however, we as a legal community should be cognizant of the differences between past and emerging technologies and of the potential prejudicial risks newer technologies may pose. It is inevitable that IVEs will continue to make their way into the courtroom, but they should not proceed unchecked. The proposed increase in authentication requirements, as well as the potential factors for judges in evaluating and moderating the use of IVEs in the courtroom, are but an initial step in integrating IVEs for courtroom use. Thus, it remains essential that further psychological and cognitive studies be conducted with regard to the use of IVEs in the courtroom.


[*] *.. Senior Submissions Editor, Southern California Law Review, Volume 92; J.D. Candidate 2019, University of Southern California Gould School of Law; B.A. 2015, University of California, Riverside. My sincere gratitude to Professor Dan Simon for his guidance and the editors of the Southern California Law Review for their excellent work.  I would also like to thank my parents, Pamela and Robert Bunker, for their unwavering support and encouragement.

 [1]. High Impact Bringing Virtual Reality to the Courtroom, High Impact, https://highimpact.com/news/High-Impact-to-Bring-Virtual-Reality-to-the-Courtroom (last visited Jan. 23, 2019).

 [2]. Damian Schofield, The Use of Computer Generated Imagery in Legal Proceedings, 13 Digital Evidence & Electronic Signature L. Rev. 3, 3 (2016). Some commentators have attributed the increase in use of computer-generated evidence (“CGE”) in the courtroom to three primary factors: (1) we have become a more visual society; (2) people retain much more of what they see than what they hear; and (3) technological advancements and decreasing costs are making this form of evidence more affordable for clients. See Mary C. Kelly & Jack N. Bernstein, Comment, Virtual Reality: The Reality of Getting It Admitted, 13 John Marshall J. Computer & Info. L. 145, 148–50 (1994).

 [3]. Carrie Leonetti & Jeremy Bailenson, High-Tech View: The Use of Immersive Virtual Environments in Jury Trials, 93 Marq. L. Rev. 1073, 1073 (2010).

 [4]. Compare Betsy S. Fielder, Are Your Eyes Deceiving You?: The Evidentiary Crisis Regarding the Admissibility of Computer Generated Evidence, 48 N.Y.L. Sch. L. Rev. 295 (2003) (discussing potential problems posed by the use of CGE), and Gareth Norris, Computer-Generated Exhibits, the Use and Abuse of Animations in Legal Proceedings, 40 Brief 10 (2011) (weighing the pros and cons of computer-generated animations in the courtroom), with Fred Galves, Where the Not-So-Wild Things Are: Computers in the Courtroom, the Federal Rules Of Evidence, and the Need for Institutional Reform and More Judicial Acceptance, 13 Harv. J.L. & Tech. 161 (2000) (arguing that computer-generated animations are akin to earlier forms of demonstrative media and should be introduced into the courtroom under existing standards).

 [5]. See, e.g., Juries ‘Could Enter Virtual Crime Scenes’ Following Research, BBC (May 24, 2016), http://www.bbc.com/news/uk-england-stoke-staffordshire-36363172 (reporting on a £140,000 European Commission grant to the Staffordshire University project for research and experiments on technology and techniques to transport jurors to virtual crime scenes).

 [6]. Fredric I. Lederer, The Courtroom 21 Project: Creating the Courtroom of the Twenty-First Century, 43 Judges’ J., Winter 2004, at 39, 42.

 [7]. Id.

 [8]. Id.

 [9]. Lars C. Ebert et al., The Forensic Holodeck: An Immersive Display for Forensic Crime Scene Reconstructions, 10 Forensic Sci. Med. Pathology 623, 62426 (2014).

 [10]. Id. A similar virtual reality (“VR”) reconstruction was developed in the United States by Emblematic Group in 2012 using audio files of 911 calls, witness testimony, and architectural drawings to re-create the events of the widely publicized Trayvon Martin shooting. Emblematic Group, One Dark Night-Emblematic Group VR, YouTube (May 9, 2015), https://www.youtube.com/watch?v
=1hW7WcwdnEg. It is also offered for download in the Google Play and Steam Store. See Mike McPhate, California Today: In Virtual Reality, Investigating the Trayvon Martin Case, NY Times (Feb. 24, 2017), https://nyti.ms/2mflo8f (interviewing one of the creators).

 [11]. See Marc Cieslak, Virtual Reality to Aid Auschwitz War Trials of Concentration Camp Guards, BBC (Nov. 20, 2016), http://www.bbc.com/news/technology-38026007.

 [12]. Although the immersive virtual environment (“IVE”) version has not yet been used at trial, the same 3-D model was previously utilized in the prosecution of wartime SS camp guard Reinhold Hanning to help assert his point of view from his post at a watchtower in the camp. Id.

 [13]. Basic VR headsets can be purchased for under $100 (for example, Google Cardboard and Samsung Gear VR), with more high-end headsets costing around $600 (for example, Oculus Rift and HTC Vive). See John Gaudiosi, Over 200 Million VR Headsets to Be Sold by 2020, Fortune (Jan. 21, 2016), http://fortune.com/2016/01/21/200-million-vr-headsets-2020; see also Stevi Rex, Global Virtual Reality Industry to Reach $7.2 Billion in Revenues in 2017, Greenlight Insights (Apr. 11, 2017), https://greenlightinsights.com/virtual-reality-industry-report-7b-2017 (forecasting global VR product sales to reach $7.2 billion by the end of 2017).

 [14]. See, e.g., Lamber Goodnow Legal Team Brings Virtual Reality Technology to the Courtroom, PR Newswire (Jan. 27, 2017), https://www.prnewswire.com/news-releases/lamber-goodnow-legal-team
-brings-virtual-reality-technology-to-the-courtroom-300397710.html (reporting on Arizona personal injury firm advertising use of VR in pending cases) (“In the old days, I’d use demonstrative exhibits, visual aids and witness statements in an attempt to ‘transport a jury to an accident scene.’ With virtual reality, not only can I transport jurors to the accident scene, I can put them in the car at impact.”).

 [15]. See, e.g., High Impact Bringing Virtual Reality to the Courtroom, supra note 1.

 [16]. See Nsikan Akpan, How Cops Used Virtual Reality to Recreate Tamir Rice, San Bernardino Shootings, PBS News Hour (Jan. 13, 2016, 5:00 PM), https://www.pbs.org/newshour/science/virtual-reality-tamir-rice-3d-laser-scans-shootings-san-bernardino (discussing law enforcement agencies use of laser scanners at crime scenes and current projects to convert these kinds of scans for use with VR headsets) (“That’s what I see coming. We’re going to be putting these goggles on juries and say look around and tell me what you see.”). For more on various types of 3-D laser scanning devices employed by law enforcement in the United States, including use with drone technologies, see Robert Galvin, Capture the Crime Scene, Officer (Jul. 19, 2017), https://www.officer.com/investigations/article
/12339566/3d-crime-scene-documentation-for-law-enforcement.

 [17]. See Jeremy N. Bailenson et al., Courtroom Applications of Virtual Environments, Immersive Virtual Environments, and Collaborative Virtual Environments, 28 Law & Pol’y 249, 255–58 (2006).

 [18]. Leonetti & Bailenson, supra note 3, at 1076.

 [19]. See Bailenson et al., supra note 17, at 258–60.

 [20]. Leonetti & Bailenson, supra note 3, at 1118.

 [21]. Caitlin O. Young, Note, Employing Virtual Reality Technology at Trial: New Issues Posed by Rapid Technological Advances and Their Effects on Jurors’ Search for “The Truth,93 Tex. L. Rev. 257, 258 (2014).

 [22]. For further explanation of the concept of immersion in virtual environments (“VEs”), see Mel Slater & Sylvia Wilbur, A Framework for Immersive Virtual Environments (FIVE): Speculations on the Role of Presence in Virtual Environments, 6 Presence 603, 604–05 (1997) (“Immersion is a description of a technology, and describes the extent to which the computer displays are capable of delivering an inclusive, extensive, surrounding and vivid illusion of reality to the senses of a human participant.” (emphasis in original)).

 [23]. Patrick Costello, Health and Safety Issues Associated with Virtual Reality – A Review of Current Literature 6–8 (1997), http://www.agocg.ac.uk/reports/virtual/37/37.pdf.

 [24]. See Frank Stenicke et al., Interscopic User Interface Concepts for Fish Tank Virtual Reality Systems, in 2007 IEEE Virtual Reality Conference 27, 27–28 (2007).

 [25]. George Robertson et al., Immersion in Desktop Virtual Reality, in Proceedings of the 10th Annual ACM Symposium on User Interface Software and Technology 11, 11 (1997); see also Stenicke et al., supra note 24, at 27. Modern-day desktop VR examples can be seen in video games, like the Call of Duty franchise, where users control their in-game avatars through a handheld controller or mouse/keyboard interface. These kinds of video games can be played from both first-person and third-person perspectives and computer-generated animations are rendered on a monitor (primarily via television and computer screens).

 [26]. Stenicke et al., supra note 24, at 27.

 [27]. See D.W.F. van Krevelen & R. Poelman, A Survey of Augmented Reality Technologies, Applications and Limitations, 9 Int’l J. Virtual Reality, no. 2, 2010, at 1, 1.

 [28]. Id. A popular example of this type of technology can be seen in Niantic’s Pokémon Go, which was released for mobile devices in July 2016. The game utilizes a user’s phone/tablet camera (which functions to depict their surrounding physical environment) and overlays virtual animations of monsters onto the camera image. Users can interact with the monsters through their touch-screen interface and the user’s real-world movements are tracked using their devices GPS services. See Pokémon Go, https://support.pokemongo.nianticlabs.com/hc/en-us (last visited Dec. 28, 2018).

 [29]. See Bailenson et al., supra note 17, at 251.

 [30]. Id. at 250–53, 259.

 [31]. An alternative configuration is a Cave Automatic Virtual Environment (“CAVE”) where the user moves in a room surrounded by rear-projection screens. The user, wearing stereoscopic glasses instead of a head-mounted display (“HMD”), is tracked through an electromagnetic device and updated visual images are reflected on the screens. See id. at 253.

 [32]. Id.

 [33]. Ralph Schroeder, Social Interaction in Virtual Environments: Key Issues, Common Themes,

and a Framework for Research, in The Social Life of Avatars 1, 2 (2002) (citation omitted).

 [34]. For a comprehensive overview of studies on user feelings of presence in IVEs, see generally James J. Cummings & Jeremy N. Bailenson, How Immersive Is Enough? A Meta-Analysis of the Effect of Immersive Technology on User Presence, 19 Media Psychol. 272 (2016) (analyzing meta data collected from eighty-three studies on immersive system technology and user experiences of presence).

 [35]. Id. at 274. Of the factors relating to the level of user presence, “results show that increased levels of user-tracking, the use of stereoscopic visuals, and wider fields of view of visual displays are significantly more impactful than improvements to most other immersive system features, including quality of visual and auditory content.” Id. at 272.

 [36]. Neal Feigenson, Too Real? The Future of Virtual Reality Evidence, 28 Law & Pol’y 271,

273 (2006). Vividness means the extent to which the display forms a “sensorially rich environment,” and interactivity results from the ability of the user to “influence the form or content of the mediated environment.” Id.

 [37]. See Young, supra note 21, at 261.

 [38]. See infra Part III.

 [39]. Leonetti & Bailenson, supra note 3, at 1077.

 [40]. See generally Fed. R. Evid.

 [41]. Feigenson, supra note 36, at 276.

 [42]. See generally Laura Wilkinson Smalley, Establishing Foundation to Admit Computer-Generated Evidence as Demonstrative or Substantive Evidence, 57 Am. Juris. Proof of Facts 3d 455 (Westlaw 2018) (providing an overview of the various legal foundations for CGE’s admission into evidence).

 [43]. Karen L. Campbell et al., Avatar in the Courtroom: Is 3D Technology Ready for Primetime?, 63 Fed’n Def. & Corp. Counsel Q. 295, 296 (2013).

 [44]. Id.

 [45]. Id. at 298.

 [46]. Substantive Evidence, Black’s Law Dictionary (10th ed. 2014).

 [47]. Kurtis A. Kemper, Admissibility of Computer–Generated Animation, 111 A.L.R. 5th 529, § 2 (2003).

 [48]. Id.

 [49]. Leonetti & Bailenson, supra note 3, at 1098–99 (“The impediments that a proponent of an IVE would face, under Rule 403, the best evidence rule, or Rule 901, are chiefly matters of foundation, i.e., the admissibility of an IVE turns on whether the proponent could establish its accuracy, reliability, and authenticity.”).

 [50]. Id. For example, a blood spatter analyst could use a recreation of the crime scene to explain her findings.

 [51]. Id. at 1099 (footnotes omitted). For a comprehensive view of potential courtroom and pre-trial IVE applications, see generally Bailenson et al., supra note 17.

 [52]. Leonetti & Bailenson, supra note 3, at 1099.

 [53]. Campbell et al., supra note 43, at 299. Thus, requiring a sufficient showing of:

(1) the qualifications of the expert who prepared the simulation and (2) the capability and reliability of the computer hardware and software used to create the simulation . . . [that] (3) the calculations and processing of data were done on the basis of principles meeting the standards for scientific evidence under Rule 702; (4) the data used to make the calculations were reliable, relevant, complete, and input properly; and (5) the process produced an accurate result.

Id.

 [54]. Demonstrative Evidence, Black’s Law Dictionary (10th ed. 2014).

 [55]. I. Neel Chatterjee, Admitting Computer Animations: More Caution and New Approach Are Needed, 62 Def. Couns. J. 36, 37 (1995).

 [56]. Smalley, supra note 42, § 8.

 [57]. Id.

 [58]. Despite the fact that an IVE would utilize computer programming to create the illustrative aid, the separate treatment of an IVE as demonstrative or substantive evidence would not depend on whether VR technology was employed to achieve the rendering. See Galves, supra note 4, at 228 (“Although demonstrative animations use programs in design, the substantive result they create is based on the witness’s testimony rather than numerical calculations and other underlying input data.”).

 [59]. Feigenson, supra note 36, at 276. Although demonstrative evidence is not technically “evidence” in the context of the Federal Rules, standards of relevance, fairness, and authentication are still enforced by courts in weighing the admissibility of demonstrative evidence through analogy. Id.

 [60]. See Fed. R. Evid. 611(a). “The court should exercise reasonable control over the mode and order of examining witnesses and presenting evidence so as to: (1) make those procedures effective for determining the truth; (2) avoid wasting time; and (3) protect witnesses from harassment or undue embarrassment.” Id.

 [61]. Fed. R. Evid. 401.

 [62]. See Fed. R. Evid. 402. “Relevant evidence is admissible unless any of the following provides otherwise: the United States Constitution; a federal statute; these rules; or other rules prescribed by the Supreme Court. Irrelevant evidence is not admissible.” Id.

 [63]. See Fed. R. Evid. 901(a).

 [64]. Id.

 [65]. Chatterjee, supra note 55, at 37.

 [66]. Smalley, supra note 42, § 9.

 [67]. See, e.g., Gosser v. Commonwealth, 31 S.W.3d 897, 903 (Ky. 2000) (“[B]ecause a computer-generated diagram, like any diagram, is merely illustrative of a witness’s testimony, its admission normally does not depend on testimony as to how the diagram was prepared, e.g., how the data was gathered or inputted into the computer.”), abrogated on other grounds by Elery v. Commonwealth, 368 S.W.3d 78 (Ky. 2012).

 [68]. See Fed. R. Evid. 901(b)(1). Significantly, this would include a re-creation of a scene or accident based on the personal knowledge of a sponsoring witness. See Leonetti & Bailenson, supra note 3, at 1098.

 [69]. See Feigenson, supra note 36, at 277.

 [70]. Campbell et al., supra note 43, at 299.

 [71]. Though, as argued in Part V, subjecting all IVE evidence to more substantive standards could have a moderating effect on some of the concerns raised in Part III.

 [72]. See, e.g., People v. McHugh, 476 N.Y.S.2d 721, 722–23 (Sup. Ct. 1984) (rejecting a motion for a pre-trial Frye hearing despite no prior instances of computer-generated animations being used at trial) (“While this appears to be the first time such a graphic computer presentation has been offered at a criminal trial, every new development is eligible for a first day in court.”); see also People v. Hood, 62 Cal. Rptr. 2d 137, 140 (Ct. App. 1997) (holding that the Kelly formulation for “new scientific procedures” does not apply to computer-generated animations when introduced as demonstrative evidence).

 [73]. See Fed. R. Evid. 403.

 [74]. Id.

 [75]. Christopher B. Mueller & Laird C. Kirkpatrick, Federal Evidence § 4:12 (4th  ed. 2013) (“Much depends on surrounding facts, circumstances, issues, the conduct of trial, and the evidence adduced already and expected as proceedings move forward.”).

 [76]. Id.

 [77]. Fed. R. Evid. 403, advisory committee’s notes to 1972 proposed rules.

 [78]. Mueller & Kirkpatrick, supra note 75, § 4:13.

 [79]. Id.

[E]vidence is unfairly prejudicial in the sense of being too emotional if it is best characterized as sensational or shocking; if it provokes anger, inflames passions, or if it arouses overwhelmingly sympathetic reactions; provokes hostility or revulsion; arouses punitive impulses; or appeals to emotion in ways that seem likely to overpower reason.

Id. (footnotes omitted).

 [80]. Id.

 [81]. Id.; see, e.g., United States v. Brown, 490 F.2d 758, 764 (D.C. Cir. 1973) (“Despite a limiting instruction to the effect that the evidence is to be considered solely on the issue of the declarant’s state of mind (the proper purpose), there is the ever-present danger that the jury will be unwilling or unable to so confine itself.”).

 [82]. Fed. R. Evid. 403, advisory committee’s notes to 1972 proposed rules.

 [83]. Id.

 [84]. See Kwan Min Lee, Presence, Explicated, 14 Comm. Theory 27, 42 (2004). Though important with respect to the study of co-presence and other social phenomenon experienced in an IVE, social presence falls outside the scope of this Note. Social presence pertains to the way in which virtually rendered social actors are experienced as actual social actors by a user and is an important concept in the understanding of feelings of co-presence between multiple users in a VE. For more on social presence, see id. at 45.

 [85]. Id. at 46.

 [86]. Id. at 44.

 [87]. Julia Diemer et al., The Impact of Perception and Presence on Emotional Reactions: A Review of Research in Virtual Reality, 6 Frontiers Psychol., Jan. 2015, at 1.

 [88]. See R.M. Baños et al., Immersion and Emotion: Their Impact on the Sense of Presence, 7 CyberPsychology & Behav. 734, 735 (2004); see also Rosa M. Baños et al., Presence and Emotions in Virtual Environments: The Influence of Stereoscopy, 11 CyberPsychology & Behav. 1, 2–3 (2008).

 [89]. Anna Felnhofer et al., Is Virtual Reality Emotionally Arousing? Investigating Five Emotion Inducing Virtual Park Scenarios, 48 Int’l J. Hum.-Computer Stud. 48, 49 (2015) (citation omitted).

 [90]. For a seminal text on psychological laboratory designs for mood induction procedures, see generally Maryanne Martin, On the Induction of Mood, 10 Clinical Psychol. R. 669 (1990).

 [91]. Giuseppe Riva et al., Affective Interactions Using Virtual Reality: The Link Between Presence and Emotions, 10 CyberPsychology & Behav. 45, 46–47 (2007).

 [92]. Id. at 46.

 [93]. Id. at 46–48.

 [94]. Id. at 47.

 [95]. Id. at 49.

 [96]. Id.

 [97]. See Felnhofer et al., supra note 89, at 50.

 [98]. Id. at 53.

 [99]. Id. at 54. Interestingly, in contrast to these findings, an experiment performed using a desktop VR system to attempt to assess whether a simulated level of illumination could impact the affective appraisal of users in a VE failed to yield any measurable results. See Alexander Toet et al., Is a Dark Virtual Environment Scary?, 12 CyberPsychology & Behav. 363, 363 (2009). This suggests that the lack of interactivity in a non-immersive environment means that these kinds of systems may not pose the same risks as an IVE in strongly influencing user emotion through design. See id.

 [100]. See generally, e.g., Donghee Shin, Empathy and Embodied Experience in Virtual Environment: To What Extent Can Virtual Reality Stimulate Empathy and Embodied Experience?, 78 Computers Hum. Behav. 64 (2017).

 [101]. Schofield, supra note 2, at 13.

 [102]. See id.

 [103]. Id.

 [104]. See Gareth Norris, The Influence of Angle of View on Perceptions of Culpability and Vehicle Speed for a Computer-Generated Animation of a Road Traffic Accident, 20 Psychiatry, Psychol. & L. 248, 252–53 (2013).

 [105]. Id. at 250.

 [106]. Id. at 251.

 [107]. Id.

 [108]. Id. at 252 (citation omitted).

 [109]. Shin, supra note 100, at 66.

 [110]. Id.

 [111]. Id. (“By experiencing a virtual version of the story location as a witness/participant, and by feeling the perspective of a character depicted in the story, users received specialized access to the sights and sounds (and even to the feelings and emotions) associated with the story.”).

 [112]. Id. at 71. Interestingly, the study also found that, despite higher levels of immersion, users with a lower empathy trait had lower levels of reported embodiment and empathy—suggesting that the disposition of certain users may have a correlation on their empathy within a virtual world. Id. at 69.

 [113]. Id. at 69 (“VR developers propose immersion but users process it.”).

 [114]. See State v. Murtha, CR03-0568598T (Conn. Super. Ct., JD Hartford, 2006); see also Neal Feigenson & Christina Spiesel, Law on Display: The Digital Transformation of Legal Persuasion and Judgment 92103 (2009) (discussing the case in detail).

 [115]. Feigenson & Spiesel, supra note 114, at 92.

 [116]. Id.

 [117]. Id.

 [118]. Id. at 92–93.

 [119]. Id. at 93–94.

 [120]. Id.; see also NYU Press, Law on Display – Murtha Video, Part One, YouTube (Sept. 23, 2009), https://youtu.be/kWMyBg6Zt-o (showing the original police footage); NYU Press, Law on Display – Murtha Video, Part 2, YouTube (Sept. 23, 2009), https://youtu.be/J0kd-vv9DeM (showing the edited footage with the animation used at trial).

 [121]. Feigenson & Spiesel, supra note 114, at 97.

 [122]. Id. at 9495.

 [123]. Schofield, supra note 2, at 13.

 [124]. Feigenson & Spiesel, supra note 114, at 251 n.113.

 [125]. See Konstantina Kilteni et al., The Sense of Embodiment in Virtual Reality, 21 Presence 373, 381–82 (2012).

 [126]. Id.

 [127]. Natalie Salmanowitz, Unconventional Methods for a Traditional Setting: The Use of Virtual Reality to Reduce Implicit Racial Bias in the Courtroom, 15 U.N.H. L. Rev. 117, 141 (2016) (“Instead of simply personifying an animated character in a digital game, immersive virtual environments can induce body ownership illusions, in which individuals temporarily feel as though another person’s body part is in fact their own.”).

 [128]. Konstantina Kilteni et al., Over My Fake Body: Body Ownership Illusions for Studying the Multisensory Basis of Own-Body Perception, Frontiers Hum. Neuroscience, Mar. 2015, at 1, 2.

 [129]. Matthew Botvinick & Jonathan Cohen, Rubber Hands ‘Feel’ Touch that Eyes See, 391 Nature 756, 756 (1998).

 [130]. Id.

 [131]. Kilteni et al., supra note 128, at 4.

 [132]. See generally H. Henrik Ehrsson et al., Threatening a Rubber Hand that You Feel Is Yours Elicits a Cortical Anxiety Response, 104 Proc. Nat’l Acad. Sci. U.S. 9828 (2007).

 [133]. See, e.g., Kilteni et al., supra note 128, at 3.

 [134]. Id. at 5, 8.

 [135]. Id. at 8.

 [136]. See id. at 11–12.

 [137]. Maria V. Sanchez-Vives et al., Virtual Hand Illusion Induced by Visuomotor Correlations, PLoS ONE, Apr. 2010, at 1, 3, https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone
.0010381&type=printable.

 [138]. Id.

 [139]. Id. at 5 (“[I]n spite of the fact that they saw the virtual hand move, did not feel their hand move, nor move it, they still blindly pointed towards the virtual hand when asked to point where they felt their hand to be.”).

 [140]. Id. at 2.

 [141]. See, e.g., Kilteni et al., supra note 128, at 9.

 [142]. See Elena Kokkinara & Mel Slater, Measuring the Effects Through Time of the Influence of Visuomotor and Visuotactile Synchronous Stimulation on a Virtual Body Ownership Illusion, 43 Perception 43, 56 (2014) (“The results provide evidence that congruent multisensory and sensorimotor feedback between the unseen real and the seen virtual legs can induce sensations that the seen legs are part of the actual body.”).

 [143]. See Konstantina Kilteni et al., Drumming in Immersive Virtual Reality: The Body Shapes the Way We Play, 19 IEEE Transactions on Visualization & Computer Graphics 597, 599, 603 (2013) (“Seeing a virtual body from first person perspective, and receiving spatiotemporally congruent multisensory and sensorimotor feedback with respect to the physical body entails an illusion of ownership over that virtual body.”).

 [144]. See Domna Banakou et al., Illusory Ownership of a Virtual Child Body Causes Overestimation of Object Sizes and Implicit Attitude Changes, 110 Proc. Nat’l Acad. Sci. 12846, 12849 (2013) (“[I]t is possible to generate a subjective illusion of ownership with respect to a virtual body that represents a child and a scaled-down adult of the same size when there is real-time synchronous movement between the real and virtual body.”); see also Tabitha C. Peck et al., Putting Yourself in the Skin of a Black Avatar Reduces Implicit Racial Bias, 22 Consciousness & Cognition 779, 786 (2013) (“IVR can be used to generate an illusion of body ownership through first person perspective of a virtual body that substitutes their own body. . . . [M]ultisensory feedback, such as visuomotor synchrony as used in our experiment, may heighten this illusion.”).

 [145]. See id. at 786 (finding that embodiment of light-skinned people in darker-skinned avatars can lead to comparative reductions in implicit racial bias).

 [146]. Ehrsson et al., supra note 132, at 9830.

 [147]. See Peck et al., supra note 144, at 786.

 [148]. While the following cases are taken from the Pennsylvania and Utah Supreme Courts respectively, the applicable rules of evidence are basically identical to the Federal Rules. See Pa. R. Evid. 403 cmt. (“Pa.R.E. 403 eliminates the word ‘substantially’ to conform the text of the rule more closely to Pennsylvania law.”); see also Pa. R. Evid. 901(a) cmt. (“Pa.R.E. 901(a) is identical to F.R.E. 901(a)”); Utah R. Evid. 901(a), 2011 advisory committee note (noting that the Utah rule is “the federal rule, verbatim.”); Utah R. Evid. 403, 2011 advisory committee note (same). For a general overview and survey of the treatment of computer animations at both the state and federal level, see generally Victoria Webster & Fred E. (Trey) Bourn III, The Use of Computer-Generated Animations and Simulations at Trial, 83 Def. Couns. J. 439 (2016).

 [149]. Commonwealth v. Serge, 896 A.2d 1170, 1176 (Pa. 2006).

 [150]. Id. at 1179–80.

 [151]. Id. at 1175.

 [152]. Id.

 [153]. Id. 1176.

 [154]. Id. at 1187. Notably, the animation was devoid of any “(1) sounds; (2) facial expressions; (3) evocative or even life-like movements; (4) transition between the scenes to suggest a story line or add a subconscious prejudicial effect; or (5) evidence of injury such as blood or other wounds.” Id. at 1183.

 [155]. Id. at 1187.

 [156]. State v. Perea, 322 P.3d 624, 635–36 (Utah 2013).

 [157]. Id. at 632.

 [158]. Id. at 635 (alterations in original).

 [159]. Id. at 635–637. Stating that

[t]he State objected and the district court refused to admit the animations, finding that “there [was] no foundation for the animation[s]” because Mr. Gaskill did not know “who created [them],” “the background of the people who created [them],” “how [they were] created,” or “what [the animators] relied upon in creating [them].”

Id.

 [160]. Id. at 637.

 [161]. David S. Santee, More than Words: Rethinking the Role of Modern Demonstrative Evidence, 52 Santa Clara L. Rev. 105, 135 (2012).

 [162]. See id. at 136.

 [163]. See id. at 136, 136 n.180, 140.

 [164]. In thinking about the effect of lighting, one cannot help but remember the first televised Nixon-Kennedy debate in which Richard Nixon refused makeup for the studio camera lighting, instead applying a cheap “coat of [drugstore] Lazy Shave to hide his five o’clock shadow.” Dan Gunderman, The Story of the First TV Presidential Debate Between Nixon and Kennedy—‘My God, They’ve Embalmed Him Before He Even Died’, N.Y. Daily News (Sept. 24, 2016, 4:25 AM), http://www.nydailynews.com/news
/politics/story-televised-debate-nixon-jfk-article-1.2803277. The interesting result being that most viewers who listened to the radio felt Nixon had prevailed, but those viewing the televised debate overwhelmingly found favor with Kennedy, who had subtly applied powder. See id.

 [165]. See supra Part III.

 [166]. See Webster & Bourn, supra note 148, at 441–42.

 [167]. John Selbak, Comment, Digital Litigation: The Prejudicial Effects of Computer-Generated Animation in the Courtroom, 9 High Tech. L.J. 337, 366 (1994).

 [168]. See Webster & Bourn, supra note 148, at 441–42.

 [169]. See State v. Perea, 322 P.3d 624, 635–36 (Utah 2013).

 [170]. As previously mentioned, federal courts are advised to rely on jury instructions to attempt to limit prejudice following the Advisory Committee Notes to Rule 403. Fed. R. Evid. 403, advisory committee’s notes to 1972 proposed rules. At the federal level, most jurisdictions rely on jury instructions which essentially include the following:

(1) an admonition that the jury is not to give the animation or simulation more weight just because it comes from a computer; (2) a statement clarifying that the exhibit is based on the supporting witness’s evaluation of the evidence; and, (3) in the case of an animation, a statement that the evidence is not meant to be an exact recreation of the event, but is, instead, a representation of the witness’s testimony.

Webster & Bourn, supra note 148, at 442.

 [171]. Bruton v. United States, 391 U.S. 123, 135 (1968) (“Unless we proceed on the basis that the jury will follow the court’s instructions where those instructions are clear and the circumstances are such that the jury can reasonably be expected to follow them, the jury system makes little sense.” (citation omitted)). But see Krulewitch v. United States, 336 U.S. 440, 453 (1949) (Jackson, J., concurring) (“The naive assumption that prejudicial effects can be overcome by instructions to the jury . . . all practicing lawyers know to be unmitigated fiction.”).

 [172]. Bruton, 391 U.S. at 135.

 [173]. Joel D. Lieberman & Jamie Arndt, Understanding the Limits of Limiting Instructions, 6 Psychol., Pub. Pol’y & L. 677, 686 (2000).

 [174]. See State v. Swinton, 847 A.2d 921, 945–46 (Conn. 2004).

 [175]. Id. at 937.

 [176]. Id. at 946.

 [177]. Id. at 938 (emphasis omitted).

 [178]. Id. at 942 (citation omitted).

 [179]. Id. at 942–43. These procedural factors included:

(1) the underlying information itself; (2) entering the information into the computer; (3) the computer hardware; (4) the computer software (the programs or instructions that tell the computer what to do); (5) the execution of the instructions, which transforms the information in some way—for example, by calculating numbers, sorting names, or storing information and retrieving it later; (6) the output (the information as produced by the computer in a useful form, such as a printout of tax return information, a transcript of a recorded conversation, or an animated graphics simulation); (7) the security system that is used to control access to the computer; and (8) user errors, which may arise at any stage.

Id. (citation omitted).

 [180]. See Fed. R. Evid. 702.

 [181]. Therefore, avoiding a situation like in Perea, where the witness cannot speak to the design of the accompanying computer-generated exhibit beyond asserting that it is a fair and accurate depiction of their testimony. See State v. Perea, 322 P.3d 624, 637 (Utah 2013).

 [182]. See Cummings & Bailenson, supra note 34, at 273.

 [183]. See Mel Slater & Anthony Steed, A Virtual Presence Counter, 9 Presence: Teleoperators & Virtual Environments 413, 426 (2000) (measuring the occurrence of user breaks in presence (“BIPs”) using an HMD); see also Sanchez-Vives et al., supra note 137, at 5; Kokkinara & Slater, supra note 142, at 56 finding that:

[T]he analysis of breaks suggest that asynchronous [visuotacticle] may be discounted when synchronous [visuomotor] cues are provided. . . . [W]e can predict a high or low estimated probability of the illusion solely from knowing which [visuomotor] group (synchronous or asynchronous) the person was in . . . asynchronous [visuotacticle] stimulation combined with asynchronous [visuomotor] stimulation is shown to be incompatible with the illusion.

Kokkinara & Slater, supra note 142, at 56.

 [184]. For a further explanation of BIPs, see generally Maria V. Sanchez-Vives & Mel Slater, From Presence to Consciousness Through Virtual Reality, 6 Nature Reviews Neuroscience 332 (2005).

 [185]. Take, for example, when a person is deeply engrossed in watching a movie:

Every so often . . . some real world event, or some event within the movie itself, will occur that will throw you out of this state of absorption and back to the real world of the theatre: someone nearby unwraps a sweet wrapper, someone coughs, some aspect of the storyline becomes especially ridiculous, and so on.

Slater & Steed, supra note 183, at 419.

 [186]. See Fed. R. Evid. 611(a), advisory committee’s notes to proposed rule (describing the broad powers of the judge to regulate demonstrative evidence).

 [187]. See, e.g., Sanchez-Vives et al., supra note 137, at 2.

 [188]. Id. at 3.

 [189]. See Ye Yuan & Anthony Steed, Is the Rubber Hand Illusion Induced by Immersive Virtual Reality?, in 2010 IEEE Virtual Reality Conference 95, 101 (2010) (“[T]he IVR arm ownership illusion appears to exist when the virtual arm roughly appears in shape and animation like the participant’s own arm, but not when there is a virtual arrow.”).

 [190]. See Mueller & Kirkpatrick, supra note 75.

 [191]. People v. McHugh, 476 N.Y.S.2d 721, 722 (Sup. Ct. 1984).

 

 

Eyewitness Identifications: Recommendations to the Third Circuit – Note by Brady Witbeck

From Volume 91, Number 3 (March 2018)
DOWNLOAD PDF


 

Eyewitness Identifications: RecoMmendations to the Third Circuit

Brady Witbeck[*]

INTRODUCTION

Just before two o’clock in the afternoon on October 22, 1991, two high school students, Chedell Williams and Zahra Howard, ascended the steps of the Fern Rock train station in North Philadelphia, planning to take a train back to their homes.[1] Seemingly out of nowhere, two men appeared, blocked the girls way up to the station, and demanded Chedells earrings. Terrified, the girls bolted in opposite directions. The two men followed Chedell. They soon caught her and tore out her earrings. Then “[o]ne of the men grabbed her, held a silver handgun to her neck, and shot her.”[2] The perpetrators fled. Chedell was pronounced dead within the hour.[3]

Police soon focused their investigation on James Dennis, who lived relatively close to the train station in the Abbotsford Homes projects. Detectives would later explain that they heard rumors that Dennis was involved in the shooting, though they were at that time “unable to identify the source of the rumors.”[4] The detectives obtained preliminary descriptions of the perpetrators from three eyewitnesses.[5] These initial descriptions did not align well with Dennis’s actual appearance. Nonetheless, a few eyewitnesses identified Dennis during subsequent photo lineups, live lineups, and the trial.[6] In presenting the government’s case, the prosecution relied heavily on these eyewitness identifications.[7] Dennis was found guilty of “first-degree murder, robbery, carrying a firearm without a license, criminal conspiracy, and possession of an instrument of a crime.”[8] He was sentenced to death.

Then, after spending twentyfour years challenging his conviction, Dennis was granted a conditional writ of habeas corpus.[9] In Dennis v. Secretary, Pennsylvania Department of Corrections, the Third Circuit Court of Appeals found that prosecutors had improperly withheld evidence that bolstered Dennis’s alibi and implicated another man in Chedell’s death.[10]

Dennis is most notable not for unearthing aberrant prosecutorial misconduct, but for Chief Judge Theodore McKee’s lengthy concurrence, which illuminated endemic failures by courts and police departments to understand and mitigate the unreliability of eyewitness identification evidence.[11] Shortly after issuing its decision in Dennis, the Third Circuit formed a task force instructed to “make recommendations regarding jury instructions, use of expert testimony, and other procedures and policies intended to promote reliable practices for eyewitness identification and to effectively deter unnecessarily suggestive identification procedures, which raise the risk of a wrongful conviction.”[12] The Task Force will rely on scientific research and is co-chaired by Chief Judge McKee.[13] By establishing this Task Force, the Third Circuit recognized that not only is there a problem with the way the criminal justice system deals with eyewitness identification evidence, but also that unreliable identifications correspond to false convictions. Chief Judge McKee’s concurrence in Dennis and the commissioning of the Task Force demonstrate that the legal system is opening up to implementing scientifically proven methods to lessen the problem of false identifications and convictions.[14]

This Note will concentrate on how system variables impact the reliability of eyewitness identifications.[15] System variables are the procedures and practices law enforcement use to elicit eyewitness identifications.”[16] Because system variables are generally within the exclusive control of law enforcement, they present the most straightforward method through which the criminal justice system can make eyewitness identifications reliable, thus decreasing the risk of false convictions. This Note will focus on how the criminal justice system can improve the eyewitness identification process. In particular, this Note evaluates suggested reforms for photo arrays, live lineups, and jury instructions.

This Note will present simple, scientifically proven approaches to reform that will lead to a more just system and more accurate identifications and convictions. The Third Circuit Task Force should adopt recommended methods found in the volumes of psychological research written on eyewitness identification and analyzed in detail in this Note. Through a combination of legislative and judicial action, the system can be dramatically improved with minimal cost and inconvenience. Part I of this Note will examine Dennis in-depth and demonstrate how failures on the part of the criminal justice system led to false identifications and Dennis’s conviction. Part II will analyze the scientific research concerning system variables as well as the intersection of science and the criminal justice system. Part III will discuss current procedures for photo arrays, live lineups, and jury instructions, and their deficiencies. Part IV will discuss how different states have tried to solve these problems. Part V will make recommendations to the Third Circuit Task Force.

I.  DENNIS V. SECRETARY, Pennsylvania Department of Corrections

As Chief Judge McKee examined the data and scientific research on eyewitness identifications, he came to the conclusion that cases like Dennis are not mere anomalies; instead, they are serious miscarriages of justice that occur too frequently and should be rectified by the judiciary.[17] Even when multiple eyewitnesses identify a person, those identifications can be unreliable and “[a]lmost without exception, eyewitnesses who identify the wrong person express complete confidence that they chose the real perpetrators.[18] Even though three people identified the defendant as the perpetrator in Dennis, the way those identifications were obtained raised “serious questions about the accuracy of those identifications.”[19] Perhaps most troubling, the jury had no way of knowing the unreliable nature of the identifications, and as a result, an innocent man spent more than twenty years on death row.[20]

On the day Chedell Williams was murdered, the police obtained initial reports from eyewitnesses to the crime.[21] Five eyewitnesses claimed they could identify the shooter.[22] These five eyewitnesses were at varying distances from the shooter when the crime took place. The eyewitnesses said the shooter wore a red sweat suit and wielded either a dull silver gun or a shiny, chrome-plated gun. One of the key eyewitnesses told police that he “would be able to identify the shooter if he saw him again,” as he was only “about six feet from the perpetrators” and looked directly at the shooter as the shooter ran away.[23]

After the police heard rumors that the shooter was Dennis, they arranged for several eyewitnesses to see if they could identify Dennis as the shooter by placing his picture in a photo array.[24] The police “compiled three arrays of eight photographs each.”[25] The first array was used to identify the shooter, the second to identify the accomplice, and the third to give the eyewitnesses an opportunity to identify a suspect. Police composed the photo arrays with pictures of seven innocent fillers and a recently taken photo of Dennis. They then individually showed the photo arrays to each eyewitness and instructed each witness to “[s]ee if you recognize anyone.”[26] Four of the nine eyewitnesses stated that Dennis looked familiar, but no eyewitness expressed a high degree of confidence in their identification at the time of the photo array.[27] Following at least two of these uncertain identifications, the photo array administrator asked the eyewitnesses if they were confident in their identifications; when responding to this question, two eyewitnesses reported greater confidence in their identifications.[28] The remaining five eyewitnesses were not able to identify the shooter with any degree of certainty.[29]

Around a month and a half later, police conducted a live lineup, which included six persons: Dennis and five fillers.[30] Only the four eyewitnesses that identified Dennis in the prior photo arrays were present at this live lineup,  and [t]he police had those four witnesses view the lineup at the same time, in the same room.[31] The police gave instructions to each eyewitness to carefully look at all of the lineup participants to see if they recognized any one of them as the suspect, and they also instructed that none of the eyewitnesses had to make an identification if they could not recognize the suspect in the lineup.[32] Two of the eyewitnesses somewhat confidently pointed out Dennis, one eyewitness was less sure, and one—the eyewitness who initially claimed that he was so close to the perpetrator that he could easily make an identification—identified a filler.[33] Later at the trial, the prosecution put three eyewitnesses on the stand, all of whom confidently pointed at Dennis, “even though all three had expressed doubt in their earlier identifications.”[34]

II.  HISTORY OF THE SCIENTIFIC AND JUDICIAL ANALYSIS OF EYEWITNESS IDENTIFICATION EVIDENCE

The debate about what role science should play in eyewitness identification evidence is not new. In 1908, Hugo Münsterberg, a pioneering psychologist, published On the Witness Stand.[35] In it, Münsterberg profiles different judicial and police practices and analyzes them to see how the judicial system could improve with respect to eyewitness testimony.[36] To illustrate the need to incorporate science into the criminal justice system, he detailed an experience that occurred after his family home was burglarized.[37] As an eyewitness at the burglary trial, he recounted various details about the robbery.[38] But after comparing his testimony to the crime-scene evidence, Münsterberg realized there were significant errors in his testimony and that, despite his best intentions, some of his memories were distorted.[39] He emphasized that human memory is inherently faulty and that perhaps the greatest impediment to justice is not intentional lies on the part of the eyewitness but the unintentional failings of memory.[40] Finally, he spoke to the issue of how the judicial system has failed to put into practice the scientific research of the time.[41]

Münsterberg was repudiated by John Wigmore for what Wigmore viewed as an uncouth attack on the legal profession, an attack that was not justified by the scientific research Münsterberg touted.[42] Wigmore viewed Münsterberg as a popular scientist—someone more interested in fame than properly integrating science and the law.[43] While Wigmore criticized Münsterberg, he himself was a strong proponent of the use of psychology in the legal profession.[44] This debate, which took place over a century ago, demonstrates that even among those who believe science should play a greater role in evidence, it is difficult to achieve a consensus on the specifics.

Today, there are some in the legal profession who believe few or no reforms are necessary to bring science and evidence together. For example, Justice Antonin Scalia, in his concurrence in Kansas v. Marsh, rejected the idea that the way the justice system handles eyewitness identifications is deeply flawed.[45] He wrote his Marsh concurrence primarily as a response to Justice Souter’s dissent, in which Souter acknowledged the primary risk of capital punishment: that the defendant is innocent.[46] Scalia claimed that because Souter did not list an instance when an innocent person was put to death, this risk is overstated.[47] Scalia went further, claiming that DNA evidence has confirmed guilt more often than it has proved the innocence of convicted persons.[48] He also claimed that the recent reversals of false convictions are not the result of “the operation of some outside force to correct the mistakes of our legal system, rather than as a consequence of the functioning of our legal system.”[49] Scalia stated that capital cases are actually given heightened judicial scrutiny, which leads to better and more accurate results, as the appeals process can be very lengthy in these cases.[50] Scalia echoed the claims of many who say that while the system may be imperfect, it still functions at a high rate of accuracy and needs minimal reform, if any.[51]

Scalia’s assessment contrasted with that of Chief Judge McKee in Dennis, in which McKee tied together the best psychological research on eyewitness identifications and recognized that the criminal justice system must improve in order to be more accurate.[52] McKee began by quoting Justice Brennan, who had stated over three decades prior that juries are likely to believe eyewitness testimony over other types of evidence, especially when the eyewitness is confident.[53] “James Dennis was sentenced to death because three eyewitnesses appeared at trial and confidently pointed their fingers at him when asked if they saw Chedell Williams’ killer in the courtroom.”[54] Because the jury was not properly instructed as to how to handle eyewitness identifications by the court and the police department was not properly trained, an innocent man was sentenced to death.[55] His purpose in writing his lengthy and thoughtful concurrence was to push the law to catch up with the science and persuade both police departments and juries to reform.[56] Reform is critical as mistaken identifications “‘erode public confidence in the criminal justice system as a whole.’”[57]

The Supreme Court itself recognized the problems inherent in eyewitness identifications in United States v. Wade.[58] In Wade, the Court declared that “the vagaries of eyewitness identification are well-known; the annals of criminal law are rife with instances of mistaken identification.”[59] The Court cited “the degree of suggestion inherent in the manner in which the prosecution presents the suspect to witnesses for pretrial identification” as a major factor contributing to misidentification.[60] The Court’s opinion recognized the danger that once an eyewitness has identified someone during a lineup, that eyewitness’s confidence in that identification can be artificially inflated. Improper suggestions and poor lineup construction during the lineup process can taint an entire trial.[61]

III.  PHOTO ARRAYS, LINEUPS, JURY INSTRUCTIONS, AND SYSTEM VARIABLES

In the United States, eyewitnesses identify some 77,000 suspects annually.[62] Often, juries place great weight on eyewitness identifications and, accordingly, the identifications provide powerful evidence against a defendant.[63] Despite the importance of eyewitnesses, their accounts are generally less accurate than most people—including judges, jurors, and attorneys—would assume.[64]

In one study, 590 participants were tested to determine if, after having spoken to a woman for fifteen seconds, they could later identify that same woman.[65] During a live lineup where the woman was present, only forty-nine percent of the participants were able to correctly identify the woman.[66] While 62% of participants correctly refrained from making an identification when the target was absent from the lineup, the remaining thirty-eight percent of participants made an identification.[67]

Police departments generally use three types of methods to obtain identifications from eyewitnesses: showups, photo arrays, and live lineups.[68] But how police departments administer these three methods varies greatly and lacks uniformity across jurisdictions.[69] With thousands of police departments and courts, it is difficult to obtain a clear picture of how different jurisdictions obtain eyewitness identifications.[70] Many police departments have no standing procedures or policies, and many police officers are not aware of how system variables, which police control, can influence the reliability of this type of evidence.[71]

This Section describes how photo arrays, live lineups, and jury instructions function and how these processes often fall short of their objective to obtain reliable identifications. Showups, ad hoc procedures where law enforcement officers bring eyewitnesses to a location to show them a suspect, will not be discussed at length.[72]

A.  Photo Arrays and Live Lineups Defined

Photo arrays and live lineups constitute important ways in which police can obtain eyewitness identifications.[73] Police regularly use both photo arrays and live lineups in their investigative efforts.[74] Though live lineups are generally considered more accurate than photo arrays, they are conducted less frequently.[75] In most photo arrays, the eyewitness is presented with a number of photographs and instructed to identify the photo of the person who the eyewitness believes committed the crime.[76] A defendant does not have the right to have an attorney present during a photo array.

Like photo arrays, live lineups are used by police either to determine or confirm the identity of a suspect.[77] In a live lineup, an eyewitness is presented with a number of people and asked to identify the person the witness believes to be the suspect.[78] Live lineups can occur either before or after an indictment.[79] Most live lineups in the United States contain around five participants.[80] Eyewitnesses either view the lineup participants sequentially or simultaneously.[81] Sequential lineups compel the eyewitness to make an absolute judgment of identity, while simultaneous lineups allow the eyewitness to make a relative judgment of identity.[82]

In a sequential lineup the eyewitness views the suspect and fillers one at a time. . . . In the original sequential lineup for each person (i.e., the suspect and fillers) the eyewitness either identifies the person as the culprit or not. If the eyewitness makes an identification the procedure ends. If no identification is made then the next person is shown to the eyewitness.[83]

In a simultaneous lineup, the eyewitness is presented with all the lineup participants at one time.[84]

B.  System Variables and Accuracy

As Chief Justice McKee stated in Dennis, system variables are within the control of law enforcement.[85] Because police departments control the practices and procedures used to acquire eyewitness identifications, the Third Circuit Task Force (“Task Force”) should examine the scientific research concerning the accuracy of those procedures. Studies have identified simple, cost-effective ways to adjust system variables to improve the reliability of eyewitness identification evidence.

Though photo arrays and live lineups are most accurate when administered blindly—that is, when the person administering the lineup does not know the identity of the suspect—very few police departments conduct blind lineups and arrays.[86] In an experiment, students were randomly assigned to play the role of either a lineup administrator or a mock eyewitness.[87] The mock eyewitnesses were shown a video of a theft in which they were exposed to the perpetrator’s face for twenty-five seconds.[88] One group, who was assigned the role of lineup administrator, was told the identity of the suspect, while the other group was not.[89] The researchers found that the non-blind administrators often smiled when the mock eyewitness viewed the suspect in the photo array and smiled after the eyewitness identified the suspect.[90] The non-blind photo arrays resulted in significantly more false identifications than the photo arrays that were administered blind.[91] The researchers also found that the non-blind administrators affected eyewitnesses’ confidence in their selections.[92]

Feedback from a non-blind administrator can manipulate eyewitnesses’ confidence in their identifications.[93] This is even true when the eyewitness mistakenly identifies the wrong person; feedback confirming an eyewitness’s mistaken identification impairs the eyewitness’s memory of the original perpetrator.[94] When an administrator makes statements like “[w]e thought this might be the one,” [t]hat’s the one you picked out in the photo,” or even more subtle, non-verbal communications, eyewitnesses’ confidence can increase and their ability to recognize the actual suspect can decrease.[95] “Relative to a no feedback condition, witnesses who received good-memory feedback expressed higher post-identification confidence in a subsequent lineup identification, whereas those who received poor-memory feedback evinced lower confidence.”[96]

Jurisdictions, as well as experts, disagree as to the advantages of using sequential lineups instead of simultaneous lineups. Some jurisdictions have reformed their procedures in order to have eyewitnesses make absolute judgments of identity, while others have cited evidence that claims relative judgments of identity are more reliable.[97] A study that purports to go against the grain of recent evidence—in that it supports simultaneous lineups—found that eyewitness identifications based on relative judgments are less reliable than those based on absolute judgments.[98][A] witness using an absolute judgment makes an identification of a lineup member if the match between that lineup member and the witness’s memory of the perpetrator is sufficiently high,” while a relative judgment can be made when the match is relatively better than any other member of the lineup.[99] The study also found that “witnesses’ reliance on relative judgments undermines the reliability of the identification evidence, and increases the relative risk of a false identification that can ultimately lead to a wrongful conviction.”[100] Despite this recent study, the scientific community is still somewhat divided on this issue, with some studies claiming that there is little difference in reliability between the two approaches.[101]

Lineup instructions given to eyewitnesses before they make identifications impact the reliability of any identification that follows.[102] Biased instructions occur when the lineup administrator fails to explicitly instruct the eyewitness that the perpetrator may not be present in the lineup and that it is permissible to identify no one.”[103] In some instances, biased instructions “compel[] witnesses to adopt a lower criterion for accepting their sense of recognition of the most familiar-looking lineup member as correct . . . and thereby enhances their confidence in making a positive identification of that lineup member,” and can also artificially increase eyewitnesses confidence in their identifications because they may assume the suspect is in the lineup.[104] In one study, participants viewed a video of a mock theft and were instructed to identify a suspect from both a thief-present and thief-absent live lineup.[105] One group was given biased instructions before they attempted to make an identification, while the other group was not. The mock eyewitnesses’ confidence was then measured. The results found that “[b]iased instructions and positive feedback increased confidence and ratings of eyewitnessing conditions.”[106] The study also found that eyewitnesses’ confidence in their identifications only modestly relates to the accuracy of those identifications.[107]

C.  Jury Instructions

Jury instructions that provide the jury with information on how to use eyewitness identifications could improve a jury’s evaluation of eyewitness evidence, thus improving the deliberation process. Instructions can inform the jury how memory works, how an identification was obtained, and dismiss the myth of the infallibility of the identification process.[108] Jury instructions regarding eyewitness identifications and their use in trials typically contain some qualifications about their accuracy, but these instructions are often generic and do not properly convey scientific realities.[109] Some experts claim that most current jury instructions do not increase a jury’s sensitivity to possible errors in eyewitness testimony.[110] This is because jurors weigh eyewitness evidence too heavily and because they are “often uncritical of the reliability of the testimony.”[111]

For example, the instructions received by the jury in Dennis were “plain vanilla” and unhelpful.[112] The instructions were long, confusing, and did not include any “explanation of the relevant system or estimator variables that so crucially impact the reliability of witness identifications.”[113] Jurors are often not aware, or at least do not receive instructions from the court, of possible inaccuracies of eyewitness testimony generally and of eyewitness identifications specifically.[114] Studies have shown that jurors do not understand how memory functions or how memory can be influenced and manipulated.[115] Juries have limited knowledge about memory and rely on eyewitness confidence, an eyewitness’s memory for minor details, and the consistency of an eyewitness’s testimony, while ignoring the impact system variables have on the reliability of eyewitness identifications.[116] The myth that people can never forget a face or that an encounter with an armed suspect increases or enhances one’s ability to identify a suspect can lead to a jury overvaluing an identification during its deliberations.[117]

IV.  DIFFERING JURISDICTIONAL APPROACHES

In the past decade, a few jurisdictions have reformed procedures with the goal of improving the reliability of eyewitness identification evidence. This Section will discuss three states in particular: New Jersey, Oregon, and North Carolina. These jurisdictions used scientific research to improve how police departments obtain identifications and to ensure that courts only admit into evidence eyewitness identifications that have indicia of truth and reliability. New Jersey and Oregon addressed eyewitness identification procedures through their respective Supreme Courts.[118] North Carolina’s legislature instituted reforms statutorily.[119] In addition to analyzing the reforms adopted by these states, this Section will evaluate proposals from experts in the fields of law and psychology who have proposed procedures and practices to increase the reliability of eyewitness identification evidence.

A.  New Jersey

Recently, the Supreme Court of New Jersey attempted to improve the reliability of eyewitness identification evidence with its decision in State v. Henderson.[120] The court overhauled its test for the admission of eyewitness identification evidence.[121] The decision called for blind administration of photo arrays and live lineups, new pre-lineup instructions, the creation of rules for lineup construction, and new record keeping procedures.[122] The court also determined that jury instructions needed to improve in order to better equip juries in their process of deliberation.[123]

In Henderson, an eyewitness to a crime was shown a photo array that included eight photographs—one of the suspect and seven of innocent fillers.[124] Before the photo array was administered, the eyewitness was given instructions that were standard in New Jersey police departments.[125] He was informed that an administrator would show him photos sequentially and the perpetrator’s photo was not necessarily included in the array.[126] The eyewitness also was instructed that the suspect could have either gained or lost weight since the incident and that facial hair could easily be altered.[127] The photos were shown to the eyewitness in an order that was random to the administrator.[128] During the photo array, the eyewitness narrowed the photos down to two, but he could not make a clear identification.[129] Police later testified that during this point in the photo array, the eyewitness was excited, so police removed him from the room, calmed him down for one to five minutes, and then showed him the eight photos again.[130] Police claimed that the eyewitness was then quickly and confidently able to identify the police suspect. The eyewitness later testified that he felt pressured to make an identification and that police pushed him to identify the suspect.[131]

The jury instructions provided at trial did not inform the jury about the influence suggestive police behavior can have on the reliability of identifications.[132] The instructions were long, confusing, and included scientific language most likely unfamiliar to jurors, and to determine whether the identification was reliable, the instructions asked jurors to consider a number of competing and seemingly contradictory factors.[133]

To improve the reliability of eyewitness identifications, the New Jersey Supreme Court addressed system variables within the control of the criminal justice system that it believed would best improve the reliability of identification evidence.[134] The court determined that because even subtle, non-intentional suggestions by police during the identification process can influence memory, photo arrays and live lineups should be administered blindly.[135] Because police departments have limited resources, the court suggested that departments could use the “envelope method” for the administration of photo arrays.[136] With the envelope method, “an officer who knows the suspect’s identity places single lineup photographs into different envelopes, shuffles them, and presents them to the witness. The officer/administrator then refrains from looking at the envelopes or pictures while the witness makes an identification.”[137] This method would decrease the likelihood of improper suggestion by the police.[138]

The court ordered that before administering a photo array or a live lineup, New Jersey police must always instruct the eyewitness that the person who committed the crime may or may not be present and that the eyewitness should not feel pressure to make an identification.[139] In order to decrease the possibility of an eyewitness simply guessing the identity of the suspect, every lineup should be composed of fillers who look similar to the suspect, so the suspect does not stand out.[140] This is so an eyewitness’s confidence is not artificially inflated by a perception that the identification process was “easy.”[141] There should be at least five fillers in a live lineup, and lineups should not feature more than one suspect.[142] The court also reminded police departments that all lineups should be recorded and preserved so that courts can later determine if the lineup was properly constructed.[143]

In order to avoid improper feedback from police that could inflate eyewitnesses’ confidence in their identification, the court held that “law enforcement officers should make a full record—written or otherwise—of the witness’ statement of confidence once an identification is made.”[144] Officers should not allow eyewitnesses to view the suspect multiple times, as this can artificially increase confidence in their identification.[145] The New Jersey Supreme Court took no position on whether police departments should favor sequential or simultaneous lineups.[146] The court believed that there was insufficient scientific evidence to show a preference for either and that more studies needed to be conducted before the court could state a preference.[147]

To better help jurors understand the eyewitness identification process, the court reformed jury instructions.[148] Lay people, on the whole, do not understand how memory works.[149] The court identified the common misconceptions that memory is similar to a video recording and that memory cannot be contaminated or distorted by outside influence.[150] Juries also tend to give disproportionate weight to the confidence of the eyewitness.[151] In order to better equip the jury to evaluate eyewitness identifications, jury instructions need to clearly and comprehensively inform the jury about the science of eyewitness identification and the nature of memory.[152] However, jury instructions should not overwhelm the jury and must be helpful to jurors.

B.  Oregon

In State v. Lawson, the Supreme Court of Oregon overhauled its test for determining the admissibility of eyewitness identifications.[153] In Lawson, the court consolidated two cases, in which the admissibility of eyewitness identification evidence was at issue.[154] Two defendants were separately tried and convicted, at least in part because of eyewitness identifications that “had been subject to an unduly suggestive police procedure in the course of identifying” the defendants.[155]

The test used by Oregon courts during the defendants’ trials to evaluate the admissibility of eyewitness identification evidence was fairly permissive, and it failed in its purpose of preventing suggestive and inaccurate identifications from being admitted into evidence.[156] The test was comprised of generic, unhelpful factors that attempted to make sure the time between the event and the identification was minimized, the certainty of the eyewitness was high, and the eyewitness had a chance to clearly see the suspect before the identification was admitted.[157]

In one of the cases consolidated in Lawson, a victim was shot in the chest and admitted to the hospital, where she was questioned by police as to the identity of her attacker.[158] The victim was shown a black-and-white photo array while heavily medicated, sedated, and restrained in her hospital bed.[159] Moreover, because her injuries necessitated a breathing tube, the victim could only respond to police questioning by nodding or shaking her head. At first, the victim did not identify anyone from the photo array; however, she eventually nodded “yes” to leading questions regarding the suspect’s identity.[160] The victim later had no recollection of this interview.[161]

Approximately two weeks later, when the victim could speak, she said that she was not able to identify the person who shot her; the following month, she was not able to pick the defendant out of another photo array, but shortly thereafter the police informed her that she had identified someone during her stay in the hospital.[162] After hearing this, the victim said she recognized the man police had identified as a suspect; however, she stated that she was not certain he was the perpetrator.[163] At a much later date, and after police repeatedly exposed the victim to the suspect’s photo, the victim identified the suspect at a live lineup and even testified at trial that she “always knew it was him.”[164] Based in part on this evidence, the defendant was convicted.[165] On appeal, the Oregon Supreme Court held that the identification should not have been admitted into evidence as it was subject to suggestive police procedure.[166]

In the case, the Oregon Supreme Court examined scientific research about system variables that the court believed could prevent false or unreliable identifications from being admitted into evidence.[167] Based on their examination of the science, the court mandated judicial and police department reforms.[168] Additionally, the Lawson court shifted the defendant’s burden to prove suggestibility onto the prosecution.[169]

In order to improve the reliability of eyewitness identification evidence, the Oregon Supreme Court found that the criminal justice system needed to improve several system variables, which are in the exclusive control of the justice system.[170] The court called for the blind administration of photo arrays to prevent an administrator from improperly influencing an identification.[171] When police administer photo arrays or live lineups, the administrator should inform eyewitnesses that they do not have to make an identification, as the perpetrator may not be in the lineup or array.[172] The court called for live lineups to be constructed using fillers that look physically similar to the suspect so the suspect does not stand out.[173] Furthermore, live lineups and photo arrays in Oregon must now be conducted sequentially so that the eyewitness makes an absolute judgment of identity instead of a relative judgment.[174]

The fact that the victim in Lawson viewed the suspect multiple times was a major factor in determining the identification was unreliable.[175] When police continually expose a victim to images of one suspect, the victim tends to become more familiar with the suspect’s face; this can result in the victim eventually identifying that suspect with confidence, even if initially the victim was unsure of the perpetrator’s identity. Because police continually exposed the victim to images of the suspect, the victim became more familiar with his face, so much so that the victim could eventually identify him with confidence, even though initially the victim was unsure of the perpetrator’s identity.[176] For this reason, after Lawson, Oregon police are required to avoid multiple viewings when conducting photo arrays and live lineups.[177]

The opinion did not elaborate in-depth about how Oregon courts should craft jury instructions on how to evaluate eyewitness identifications, but the court suggested that future jury charges should include reference to system variables that influence reliability.[178] The court cited an Oregon evidence rule that stated identifications must be helpful to the trier of fact.[179] Therefore, identifications, when admitted, should not serve to confuse the jury but should help the jurors with their fact-finding, thus providing another reason to improve the reliability of eyewitness identifications.[180]

C.  North Carolina

While legislatures lack some of the sophisticated legal experience of the courts, passing laws to regulate police conduct can be an effective way to quickly and authoritatively adjust system variables. North Carolina took this approach with the North Carolina Eyewitness Identification Reform Act (“the Act”).[181] The Act, passed in 2007, attempts to incorporate scientific advances in the field of eyewitness identifications to better assure reliability and bolster the truth finding function of the criminal justice system in North Carolina.[182] To further this goal, it provides instructions for police departments on how to administer identifications according to the best available practices.[183]

The Act calls for independent administrators, who are not aware of the suspect’s identity, to carry out both photo arrays and live lineups.[184] The independent administrator will give instructions that inform the eyewitness that the perpetrator may or may not be in the lineup or photo array, and will also state that the investigation does not hinge on the eyewitness making an identification, so the eyewitness should not feel undue pressure to make one.[185]

Under the Act, both photo arrays and live lineups should contain at least five innocent fillers who resemble the suspect.[186] Lineups and photo arrays with more than one suspect are prohibited, and eyewitnesses are separated from others who are making an identification to prevent them from conferring with one another before or during the live lineup or photo array.[187] Eyewitnesses are not be provided any information about the suspect, and police make a video recording of the process or an audio recording if a video recording is not feasible.[188] The Act also proposes that lineups could be administered by a computer program as an alternative method to keep the administrator from seeing the photo in front of the witness.[189]

In order to facilitate these reforms, law enforcement officers are required to go through training programs so that they know how to conduct lineups and photo arrays in compliance with this statute.[190] The Act calls for the creation of materials and classes to facilitate the training of law enforcement officers.[191] Two preexisting North Carolina police-training agencies, the North Carolina Criminal Justice Education and Training Standards Commission and the North Carolina Sheriffs’ Education and Training Standards Commission, were made responsible for creating these programs and materials.[192]

D.  Scholarly Proposals

Legal scholars have proposed reforms that often go further than the changes made in states like New Jersey, Oregon, and North Carolina. For example, the National Academy of Sciences issued a report addressing the reliability of eyewitness identification evidence.[193] The academy’s goal was to digest the current scientific research on the subject and present it to law enforcement and the legal community, and the academy called for greater cooperation among the law enforcement and scientific communities so that identification procedures can improve across the country.[194] Scholars hope that the training of law enforcement as to how memory works and how law enforcement can unintentionally influence identifications will allow police to see why reform is necessary.[195]

The report made recommendations as to how jurisdictions can improve the reliability of eyewitness testimony.[196] It called for blind administration of lineups, uniform and “easily understood instructions” to be provided to the eyewitness prior to an identification, and careful documentation of eyewitnesses confidence in their identifications.[197] These instructions should inform the eyewitness that “the perpetrator may or may not be in the photo array or lineup and that the criminal investigation will continue regardless of whether the witness selects a suspect.”[198] The report suggested that due to a lack of consensus as to the merits of sequential versus simultaneous lineups, neither method should be preferred.[199]

The academy acknowledged that some police departments are hesitant to make changes that would require them to stretch their limited resources.[200] In response, the committee suggested that “departments consider procedures and new technologies” that would alleviate this concern.[201] For example, if a non-blind administrator is not available, a department could use either a “computer-automated presentation of lineup photos”[202] or the envelope method that is employed in New Jersey.[203] The eyewitness identification process should also be videotaped, even though doing so could increase costs and burden eyewitnesses’ privacy interests.[204] However, when these concerns arise, departments can videotape the process non-intrusively, and in fact, many departments already have the technology that would allow them to document these procedures.[205]

The report called for the “use of clear and concise jury instructions” to assist jurors in their fact-finding mission.[206] Jury instructions can convey the most important underlying aspects of the identification process in clear language.[207] This would allow the jury to properly give weight to eyewitness identification evidence in its deliberations.[208] “Appropriate legal organizations, together with law enforcement, prosecutors, defense counsel, and judges, should convene a body to establish model jury instructions regarding eyewitness identifications.”[209]

Going forward, the academy recommended that a national research initiative be established to increase our understanding of the science of eyewitness identifications.[210] The research initiative would allocate future funds for research, formulate new policy positions, review research, advocate future policy changes, and provide formal assessments of reforms across the country.[211]

Separately, Dan Simon, in his book In Doubt, proposed a series of reforms that could fix the systematic errors inherent in the identification process.[212] He proposed a series of reforms that would “provide best-practice protocols andare directed at the twofold goal of maximizing the accuracy of identifications and the transparency of the procedures used to elicit them.[213] Simon’s reforms include:

2. . . . [L]ive and video lineups should be preferred over photographic arrays.

3. Suspects should not be placed in identification procedures absent an appreciable threshold of guilt.

4. Prior to the lineup, witnesses should not be exposed to any identifying information about the suspect from any source.

5. Lineups should be conducted as soon as possible after the witnessed event.

6. Lineups should include only one suspect and five or more fillers whose innocence is beyond doubt.

7. Fillers should match the witness’s description of the perpetrator and not be noticeably dissimilar from the suspect.

8. The suspect should be allowed to determine his place in the lineup and to change places between lineups.

9. The witness should be instructed that the perpetrator “may or may not be” in the lineup, and that it is appropriate to respond “perpetrator is not present,” and “don’t know.”

10. Targets should be presented sequentially (rather than simultaneously).

11. All identification procedures should be “double blind”: the administrator must be kept unaware of the identity of the suspect; the witness should be informed that the administrator does not know the suspect’s identity.

12. The administrator should refrain from any communication or behavior that could be interpreted as suggestive or revealing of the identity of the suspect.

13. The witness should announce his recognition or nonrecognition, followed immediately by a confidence statement. The witness should not be given any feedback before completing the statement.

14. The time it took the witness to announce recognition should be measured and recorded. . . .

16. Witnesses who at any time pick someone other than the suspect should not be allowed to provide any identification testimony about the suspect.

17. Witness [sic] who fail to identify the suspect, make a hesitant decision, or express low confidence at the initial identification should be deemed to have a weak memory of the suspect.

18. The procedure should be recorded in its entirety, preferably on videotape. Recording should include the images used and the instructions given. The witness should be videotaped throughout the procedure.[214]

Simon also suggests that the composition of lineups be computerized to altogether remove the human error element from the equation.[215]

Simon does recognize that the implementation of most of these ideas is uncontroversial, but also that there is an inherent trade off “between the intended objective of reducing false identifications and the unintended effect of losing correct identifications.”[216] Despite this, Simon argues that these proposed reforms would provide a net gain for the judicial system.[217] Providing a complete record of identification procedures is critical for minimizing “the effects of memory decay, contamination, and any other biases induced by the investigation and pretrial procedures” and providing fact finders and other decision makers with the best possible information for assessing the reliability of the identifications.”[218]

V.  RECOMMENDATIONS FOR THE THIRD CIRCUIT TASK FORCE

The Third Circuit should borrow the best and most practicable reforms undertaken by North Carolina, New Jersey, and Oregon. These states have taken steps toward integrating scientific research into the judicial system, thus making eyewitness identification evidence more reliable. The Task Force has the opportunity to combine the best ideas of these states to lower the risk of wrongful convictions in the Third Circuit. These reforms can further serve as a model for other jurisdictions to reform their policies and procedures. Though live lineups generally produce more reliable evidence than photo arrays, the Task Force should recommend reforms for both photo arrays and live lineups given the impracticality of having a live lineup for every identification. The Task Force should also address jury instructions.

Because the composition of a lineup can greatly influence the reliability of the resulting identification, the Task Force should provide clear guidelines on how and when lineups should be conducted. Lineups and photo arrays should be conducted close in time to when the crime took place, so the eyewitness is more likely to remember the suspect. In many of the cases discussed above, police conducted lineups months or even a year after an event occurred, which led to decays in memory and ultimately false identifications.

The Task Force should adopt a policy similar to that of North Carolina, which requires that live lineups include at least five fillers.[219] These fillers should be similar in race, height, age, and facial structure to the suspect. Ideally, lineups will be composed by a computer program to ensure similarity among the lineup participants. If the suspect has a unique feature, such as a mole or a tattoo, lineup administrators should select photos of other suspects with the same features or alter the filler photos so that the unique feature is present in all or most of the photographs.[220] Photos of the suspect should not be more than a year old[221] and, whenever possible, should not be photos where the suspect has different facial hair than during the time the incident took place.[222] Lineups should never include more than one suspect.[223] As in New Jersey, all live lineups and photo arrays should be recorded and preserved, so that if the reliability of the identification is brought into question, a court can use the recording to help determine if the identification was reliable.

Whenever possible, photo arrays and live lineups should be administered in isolation, away from third parties who could influence the evidence. Lineup administrators should select quiet, separate areas of police precincts and ensure that the eyewitness is separated from other police officers and other eyewitnesses. The lineup administrator should ensure that the eyewitness does not have any access to case materials, including “information about the case, [and] the progress of the investigation.”[224] Eyewitnesses should not be allowed to see images of the suspect outside of the lineup administration, including wanted posters of the suspect that may be hanging in the police department where the photo array or live lineup is being administered.[225]

The Third Circuit should mandate that police departments administer both photo arrays and live lineups blindly. Blind administration increases the accuracy of eyewitness identifications and lowers the risk of feedback from the lineup administrator.[226] Because police resources are limited, the task force should recommend that even where the photo array administrator is not blind to the suspect’s identity, the police department should follow the envelope method employed in New Jersey. This method will prevent the administrator from seeing the photographs before eyewitnesses make an identification, removing the risk that the administrator could influence the eyewitnesses beforehand.[227] However, because the administrator could provide feedback to the eyewitness post-identification, the envelope method should only be used when a fully blind test is impractical.

The Task Force should recommend that police departments change the way they instruct eyewitnesses prior to administering either a photo array or a live lineup. Because biased instructions lead to false identifications and artificially increased confidence in those identifications, it is critical that police departments give uniform, unbiased instructions to eyewitnesses.[228] Lineup administrators should explicitly state that the suspect may or may not be in the photo array or live lineup and that the entire case does not rely on the eyewitness making an identification. Police should try to ensure that eyewitnesses do not feel pressure to make an identification and that they are aware they can say that they do not know if the suspect is in the lineup. These eyewitness instructions are important because an eyewitness should not assume that the suspect is in the lineup. Furthermore, multiple viewings of the suspect by eyewitnesses should not be allowed, so as not to inflate their confidence in the identification.

Once an eyewitness makes an identification, police should immediately record the level of confidence the eyewitness has in that identification. Although juries often overvalue eyewitness confidence, it can serve a role at trial, especially if the confidence is measured immediately after an identification.[229] Eyewitnesses’ confidence in their identifications can be used as a factor to determine if the evidence is admissible. When an eyewitness identifies a suspect without hesitation and without prompting by the lineup administrator, that identification is more likely to be reliable. The eyewitness should also confirm in writing the identification. This provides an additional failsafe to ensure that the eyewitness was not coerced into making an identification and allows for a statement of confidence to be in writing.

Because the scientific community is split on whether sequential or simultaneous viewing of a lineup results in the most reliable identifications, the Task Force should not state a preference for either.

The Task Force should improve existing jury instructions. If a jury were equipped to properly weigh eyewitness evidence and were aware of how and why some identifications are unreliable, police departments could internally strive to improve system variables knowing that a jury may discard improperly obtained identification evidence. Some jurisdictions use expert testimony to inform the jury about eyewitness identifications; however, this method generally appears unsuccessful.[230] Because current instructions do not assist the jury in properly evaluating eyewitness identifications, new, standard instructions should be implemented.

As in Dennis, the jury instructions in New Jersey prior to judicial reform were confusing and muddled.[231] This led the New Jersey Supreme Court to implement new jury instructions. Because the Third Circuit’s jury instructions are similar to those previously used in New Jersey—in that they are too long and do not explain simply how eyewitness identifications can be inaccurate and unreliablethe Task Force should also implement better jury instructions.[232] Proper instructions give juries a tool to compensate for their limited knowledge of how memory functions. Instructions should encourage a jury to examine various factors to determine not only if police procedure leading up to the identification was proper, but also if the eyewitness’s memory shows indicia of reliability. As in New Jersey, the Task Force should inform juries that they should refrain from assigning undue weight to eyewitness confidence; however, they should also be wary of overwhelming the jury with scientific information.

Finally, the Third Circuit Task Force should recommend a training program for police departments that will help implement these reforms. When implementing its legislative reforms, North Carolina recognized that training was essential to increase the reliability of eyewitness identification evidence.[233] Police officers should be instructed that following these procedures will not necessarily result in fewer convictions, but will help ensure that investigations are conducted in a manner most conducive to truthfinding. The Task Force could appoint a team of experts to travel to conferences and individual police departments to train police on how best to implement the proposed reforms. In order to ensure compliance, the Task Force should require periodic reports from both trial courts and police departments as to how the proposals are being implemented and if any modifications to the reforms are necessary in the future. The Task Force should reconvene in five years to reexamine scientific evidence and suggest any further changes.

CONCLUSION

The investigative procedures used in Dennis that caused such an unjust outcome are employed in many jurisdictions across the country. The Third Circuit Task Force on Eyewitness Identifications has been presented with the unique opportunity to examine every facet of the eyewitness identification process and recommend changes that will serve to decrease the risk of false convictions. New Jersey, Oregon, and North Carolina, among others, provide a path forward that the Task Force should follow. Through blindly administrated lineups, correct pre-lineup instructions, proper construction of lineups, helpful jury instructions, and other reforms analyzed above, the Third Circuit can serve as an example of how scientific research can be implemented into the justice system to produce both fair and just results.

 


[*] *. Executive Articles Editor, Southern California Law Review, Volume 91; J.D. Candidate 2018, University of Southern California Gould School of Law; B.A. History 2015, Brigham Young University. I would like to thank Professor Dan Simon and Professor Sam Erman for valuable guidance and feedback on earlier drafts of this note. In addition, I would like to thank the staff and editors of the Southern California Law Review for their excellent work.

 [1]. Dennis v. Sec’y, Pa. Dep’t of Corr., 834 F.3d 263, 269 (3d Cir. 2016).

 [2]. Id.

 [3]. Id.

 [4]. Id.

 [5]. Id. at 270.

 [6]. Id. at 270–71.

 [7]. Id.

 [8]. Id. at 275.

 [9]. Id. at 269.

 [10]. Id. at 275, 287.

 [11]. See generally id. at 313–44 (McKee, C.J., concurring).

 [12]. Order Establishing Third Circuit Task Force on Eyewitness Identifications (Sept. 9, 2016), http://www.ca3.uscourts.gov/sites/ca3/files/TFEyewitnessIdOrder_11042016.pdf.

 [13]. Id.

 [14]. See               Dennis, 834 F.3d at 313 (McKee, C.J., concurring); Order, supra note 12.

 [15]. While there are other factors that contribute to unreliable eyewitness identification evidence, this Note will only focus on system variables.

 [16]. Dennis, 834 F.3d at 321 (McKee, C.J., concurring).

 [17]. Id. at 313–16.

 [18]. Id. at 315.

 [19]. Id.

 [20]. Id. at 316.

 [21]. Id. at 317.

 [22]. Id.

 [23]. Id.

 [24]. Id. at 318. Police never clarified where the rumors originated or why the detectives decided to further investigate the rumors. Id.

 [25]. Id.

 [26]. Id.

 [27]. Id.

 [28]. Id.

 [29]. Id. at 319.

 [30]. Id.

 [31]. Id.

 [32]. Id. at 320.

 [33]. Id.

 [34]. Id.

 [35]. Hugo Münsterberg, On the Witness Stand: Essays on Psychology and Crime (1908).

 [36]. See generally id.

 [37]. Id. at 39–44.

 [38]. Id. at 39.

 [39]. Id. at 39–40.

 [40]. See id. at 40, 67–68.

 [41]. See generally id.

 [42]. See James M. Doyle, Ready for the Psychologists: Learning from Eyewitness Errors, 48 Ct. Rev. 4, 4–5 (2012).

 [43]. See id. at 4.

 [44]. Id.

 [45]. See Kansas v. Marsh, 548 U.S. 163, 182–99 (2006) (Scalia, J., concurring).

 [46]. Id. at 185–86.

 [47]. Id. at 188.

 [48]. Id.

 [49]. Id. at 193.

 [50]. Id. at 198.

 [51]. Id.

 [52]. See generally Dennis v. Sec’y, Pa. Dep’t of Corr., 834 F.3d 263, 313–44 (3d Cir. 2016) (McKee, C.J., concurring).

 [53]. Id. at 313.

 [54]. Id.

 [55]. Id.

 [56]. Id.

 [57]. Id. at 316 (quoting Comm. on Sci. Approaches to Understanding and Maximizing the Validity and Reliability of Eyewitness Identificationin Law Enforcement and the Courts et al., Identifying the Culprit: Assessing Eyewitness Identification 22 (2014)).

 [58]. See generally United States v. Wade, 388 U.S. 218 (1967).

 [59]. Id. at 228.

 [60]. Id.

 [61]. Id. at 231–33.

 [62]. Dan Simon, In Doubt: The Psychology of the Criminal Justice Process 50–51 (2012).

 [63]. Id.

 [64]. Id. at 51, 53.

 [65]. A. Daniel Yarmey, Eyewitness Recall and Photo Identification: A Field Experiment, 10 Psychol. Crime & L. 53, 53 (2004).

 [66]. Id.

 [67]. Id.

 [68]. Simon, supra note 62, at 51.

 [69]. Id.

 [70]. Id.

 [71]. Id. at 76.

 [72]. For an in-depth discussion of show-up procedures, see id. at 70–71, 77–78.

 [73]. Id. at 51–52.

 [74]. Id. at 52.

 [75]. Id. at 69.

 [76]. Id. at 51–52.

 [77]. Id. at 70.

 [78]. Id.

 [79]. Id. at 81.

 [80]. Id. at 72.

 [81]. Id. at 71.

 [82]. Id.

 [83]. Daniel B. Wright, The Impact of Eyewitness Identifications from Simultaneous & Sequential Lineups, 15 Memory 746, 748 (2007).

 [84]. Simon, supra note 62, at 71.

 [85]. Dennis v. Sec’y, Pa. Dep’t of Corr., 834 F.3d 263, 321 (3d Cir. 2016) (McKee, C.J., concurring).

 [86]. Steve D. Charman & Vanessa Quiroz, Blind Sequential Lineup Administration Reduces Both False Identifications and Confidence in Those False Identifications, 40 Law & Hum. Behav. 477, 477, 483–84 (2016).

 [87]. Id. at 477.

 [88]. Id. at 480.

 [89]. Id. at 477.

 [90]. Id.

 [91]. Id.

 [92]. Id. at 484.

 [93]. Laura Smalarz & Gary L. Wells, Confirming Feedback Following a Mistaken Identification Impairs Memory for the Culprit, 38 Law & Hum. Behav. 283, 283 (2014).

 [94]. Id.

 [95]. Id.

 [96]. Michael R. Leippe, Donna Eisenstadt & Shannon M. Rauch, Cueing Confidence in Eyewitness Identifications: Influence of Biased Lineup Instructions and Pre-Identification Memory Feedback Under Varying Lineup Conditions, 33 Law & Hum. Behav. 194, 197 (2009).

 [97]. See infra Part IV (discussing how various jurisdictions have either adopted or rejected this reform).

 [98]. Steven E. Clark, Michael A. Erickson & Jesse Breneman, Probative Value of Absolute and Relative Judgments in Eyewitness Identification, 35 Law & Hum. Behav. 364, 364 (2011).

 [99]. Id.

 [100]. Id. at 377.

 [101]. See, e.g., Comm. on Sci. Approaches to Understanding & Maximizing the Validity & Reliability of Eyewitness Identificationin Law Enf’t & the Courts et al., Identifying the Culprit: Assessing Eyewitness Identification 104 (2015) [hereinafter Comm. on Sci.].

 [102]. Leippe, supra note 96, at 196, 204.

 [103]. Id. at 196.

 [104]. Id.

 [105]. Id. at 194.

 [106]. Id.

 [107]. Id.

 [108]. See Dennis v. Sec’y, Pa. Dep’t of Corr., 834 F.3d 263, 341–44 (3d Cir. 2016) (McKee, C.J., concurring).

 [109]. Id. at 342.

 [110]. See, e.g., Richard A. Wise et al., An Examination of the Causes and Solutions to Eyewitness Error, Frontiers Psychiatry, Aug. 14, 2014, at 1, 4.

 [111]. Wright, supra note 83, at 747.

 [112]. Dennis, 834 F.3d at 342 (McKee, C.J., concurring).

 [113]. Id.

 [114]. Id.

 [115]. See generally Wise, supra note 110.

 [116]. Id. at 1–2.

 [117]. Dennis, 834 F.3d at 342–43 (McKee, C.J., concurring).

 [118]. See generally State v. Lawson, 291 P.3d 673 (Or. 2012); State v. Henderson, 27 A.3d 872 (N.J. 2011).

 [119]. N.C. Gen. Stat. § 15A-284.50–.53 (2015).

 [120]. See Henderson, 27 A.3d at 896–903.

 [121]. See id.

 [122]. Id. at 896–900.

 [123]. Id. at 878.

 [124]. Id. at 880–81.

 [125]. Id.

 [126]. Id.

 [127]. Id. at 881.

 [128]. Id.

 [129]. Id.

 [130]. Id.

 [131]. Id.

 [132]. See id. at 882–83.

 [133]. Id.

 [134]. See id. at 892, 896.

 [135]. Id. at 896–97.

 [136]. Id. at 897.

 [137]. Id.

 [138]. Id.

 [139]. Id.

 [140]. Id. at 897–98.

 [141]. Id. at 898.

 [142]. Id.

 [143]. Id.

 [144]. Id. at 900.

 [145]. Id. at 900–01.

 [146]. Id. at 901–02.

 [147]. Id.

 [148]. Id. at 910–11.

 [149]. Id. at 910.

 [150]. Id. at 894–95.

 [151]. Id. at 910–11.

 [152]. See id. at 910–11, 924–25.

 [153]. See State v. Lawson, 291 P.3d 673, 688 (Or. 2012).

 [154]. Id. at 678.

 [155]. Id.

 [156]. Id. at 683–84, 688–89.

 [157]. Id. at 683–84.

 [158]. Id. at 678–79.

 [159]. Id. at 679.

 [160]. Id.

 [161]. Id.

 [162]. Id.

 [163]. Id. at 679–80.

 [164]. Id. at 680.

 [165]. Id.

 [166]. Id. at 698.

 [167]. Id. at 685–88.

 [168]. Id. at 698.

 [169]. Id. at 693–94.

 [170]. Id. at 685.

 [171]. Id. at 686.

 [172]. Id.

 [173]. Id.

 [174]. Id.

 [175]. See id. at 698.

 [176]. Id. at 686–87.

 [177]. Id.

 [178]. Id. at 688.

 [179]. Id. at 693–94.

 [180]. Id.

 [181]. N.C. Gen. Stat. § 15A-284.50–.53 (2015).

 [182]. See id. § 15A-284.51–.52.

 [183]. Id. § 15A-284.52.

 [184]. Id. § 15A-284.52(b).

 [185]. Id.

 [186]. Id.

 [187]. Id.

 [188]. Id.

 [189]. Id. § 15A-284.52(c).

 [190]. Id. § 15A-284.53.

 [191]. Id.

 [192]. Id.

 [193]. See generally Comm. on Sci., supra note 101.

 [194]. See id. at xiii–xiv.

 [195]. Id. at 106.

 [196]. Id. at 105–12.

 [197]. Id. at 104.

 [198]. Id. at 107.

 [199]. Id. at 104.

 [200]. Id. at 106–07.

 [201]. Id. at 106.

 [202]. Id. at 107.

 [203]. Id. See also State v. Henderson, 27 A.3d 872, 897–99 (N.J. 2011).

 [204]. Comm. on Sci., supra note 101, at 109.

 [205]. See id.

 [206]. Id. at 112.

 [207]. See id.

 [208]. Id.

 [209]. Id.

 [210]. Id. at 113–14.

 [211]. Id.

 [212]. See generally Simon, supra note 62.

 [213]. Id. at 82–83.

 [214]. Id. at 83–84.

 [215]. Id. at 86–87.

 [216]. Id. at 84.

 [217]. See id at 84–86.

 [218]. Id. at 85.

 [219]. N.C. Gen. Stat. § 15A-284.52 (2015).

 [220]. Memorandum from Deputy Att’y Gen. Sally Q. Yates to Heads of Dep’t Law Enf’t Components All Dep’t Prosecutors (Jan. 6, 2017) (on file with author) [hereinafter Yates].

 [221]. Although Yates’s memo recommends against using a photograph that is “several years old,” the routine use of photographs that are no more than one year old would be ideal to increase the probability of accurate identifications. See id.

 [222]. Id.

 [223]. Simon, supra note 62, at 83.

 [224]. Yates, supra note 220.

 [225]. Id.

 [226]. Charman, & Quiroz, supra note 86, at 484.

 [227]. See State v. Henderson, 27 A.3d 872, 897 (N.J. 2011).

 [228]. Leippe, supra note 96, at 197.

 [229]. Yates, supra note 220.

 [230]. Wise, supra note 110, at 4–5.

 [231]. Henderson, 27 A.3d at 882–84.

 [232]. Dennis v. Sec’y, Pa. Dep’t of Corr., 834 F.3d 263, 342 (3d Cir. 2016) (McKee, C.J., concurring).

 [233]. See N.C. Gen. Stat. § 15A-284.50–.53 (2015).

The Mismatch Between Twenty-First-Century Forensic Evidence and Our Antiquated Criminal Justice System – Article by Erin Murphy

From Volume 87, Number 3 (March 2014)
DOWNLOAD PDF

The shortcomings of forensic evidence in the criminal justice system are now well known. But most scholarly attention has concentrated on “first-generation” forensic techniques such as hair or pattern analysis, bite marks, firearms, and ballistics. Moreover, most of the attention has centered on the investigative process, specifically the collection and analysis of evidence. This Essay turns the critical lens on scientific evidence in a different direction. It focuses on “second-generation” technologies—such as location tracking, biometrics, digital forensics, and other database-driven techniques, and it scrutinizes the adjudicative system—the “bail to jail” stream—rather than the investigative process. Ultimately, this Essay argues that almost every aspect of the adversarial process, as currently conceived, is ill-suited to ensuring the integrity of high-tech evidence. Specifically, the adversarial model demands individualized rather than collective inquiries, embraces secrecy rather than transparency, and privileges viva voce evidence over other forms of fact-gathering. Furthermore, it heavily depends upon the skill of counsel and in-court confrontation rather than out-of-court oversight and structural reform to address problems related to evidentiary integrity, and adopts rigid rules of finality grounded in part on an assumption that proof is always inconclusive. This Essay concludes that the eighteenth-century model of justice may be ill-suited to twenty-first-century evidence, and offers recommendations for a more reliable factfinding system.


 

87_633

Black and White or Red All Over? The Impropriety of Using Crime Scene DNA to Construct Racial Profiles of Suspects – Note by Natalie Quan

From Volume 84, Number 6 (September 2011)
DOWNLOAD PDF

When the body of a deceased woman was found near the Mississippi River close to Baton Rouge in July 2002, DNA retrieved from the crime scene was linked to the murders of two other women in the area, and multiple law enforcement agencies subsequently began an aggressive search for the serial killer. Using witness statements and an FBI profile, the FBI, the Louisiana State Police, and the police and sheriff’s departments of Baton Rouge determined that their suspect was a young white man. After a fourth murder believed to have been committed by the same perpetrator occurred in December 2002, officials intensified their hunt for the killer by spending over one million dollars to collect and test the DNA of some 1200 white men in the area, but they made no matches and consequently had no leads.

In March 2003, the investigators crossed paths with molecular biologist Tony Frudakis of the company DNAPrint Genomics, who claimed that he could ascertain the suspect’s social race by testing the crime scene DNA for 176 specific genetic markers that disclose information about physical traits. Frudakis said that because certain markers are found predominantly in people of African, Indo-European, Native American, or South Asian roots, he could analyze their frequencies and predict the suspect’s ancestry with 99 percent accuracy, and then infer social race from this ancestry finding. Initially skeptical of the science, officials sent Frudakis DNA samples from twenty individuals with known racial designations—and upon blind testing the samples, Frudakis correctly identified the race of each individual.

Even more intriguing were the results of Frudakis’s analysis of the Baton Rouge serial killer’s DNA. Using a test he called DNAWitness, Frudakis concluded that the suspect’s “biogeographical ancestry” was 85 percent Sub-Saharan African and 15 percent Native American, which left, in his words, “no chance that this is a Caucasian. No chance at all.”


 

84_1403

The Fraud Exception to the Parol Evidence Rule: Necessary Protection for Fraud Victims or Loophole for Clever Parties? – Note by Alicia W. Macklin

From Volume 82, Number 4 (May 2009)
DOWNLOAD PDF

Consider the following hypothetical: Two businesses—X, a software company, and Y, a retailer—reach a typical agreement regarding a software license. After extended negotiations, a written, integrated agreement finalizes the deal; it states that X will license software to Y and provide related hosting and technical support services. It does not include, nor did the two parties ever discuss, implementation of the software. Some time after the agreement was made, Y attempts to compel X to implement the software. Y later argues in court that X made fraudulent oral promises that induced Y to sign the written agreement. Y claims that X additionally agreed to provide both a total cost of ownership guarantee, including implementation, and the assistance of its consulting and development personnel to implement the software. Y’s lawyers correctly realize that, in California, the courts have allowed extrinsic evidence of fraudulent promises when those promises are consistent with or independent of the written agreement, notwithstanding the Parol Evidence Rule (“PER”). Thus, while X can present its best argument that the promise to implement the software would directly contradict or vary the terms of the limited licensing contract, the outcome in court is still unpredictable. Unsuspecting X is in danger of being forced to bear a substantial burden for which it never intended to contract.


 

82_809

Sampling Evidence at the Crossroads – Article by Laurens Walker & John Monahan

From Volume 80, Number 5 (July 2007)
DOWNLOAD PDF

McLaughlin v. Phillip Morris USA, Inc., has been certified as a nationwide class action on behalf of an estimated 50 million “light” cigarette smokers. Plaintiffs seek more than $280 billion in damages, to be trebled to over $800 billion. In certifying this mass tort, District Judge Jack B. Weinstein announced his plan to completely abandon individualized adjudication in favor of aggregate factual determinations based on evidence from statistical samples. Prior to McLaughlin, at least two federal trial judges had permitted the use of sampled evidence in major consolidated or class action trials, but both included some adjudication of individual claims. In McLaughlin, Judge Weinstein’s plan would entirely eliminate proof of individual class member claims in the face of the overwhelming cost of gathering such evidence from tens of millions of plaintiffs. The central issue in the interlocutory appeal now before the Second Circuit is the legality of Judge Weinstein’s plan to use sampled evidence to determine whether the plaintiff class members relied on representations by the defendants that “light” cigarettes were less harmful than regular cigarettes, and, if so, to determine the aggregate amount of damages.

In this Article, we address and defend Judge Weinstein’s controversial proposal to statistically sample evidence, rather than to obtain evidence on an individualized, case-by-case basis. We endorse his view that statistical sampling combined with other evidence “is a necessary and pragmatic evidentiary approach that reflects full due process in this and many other mass tort cases.”


 

80_969