DNA has been considered the gold standard of forensic evidence for more than 30 years, even as various types of junk science have fallen out of use. But in recent years, police and crime labs have stretched and expanded how they use genetic material to pin suspects—from tapping into private ancestry websites to creating police sketches of suspects’ faces.
The latest practice to come under scrutiny is an obscure technique, “probabilistic genotyping,” that takes incomplete or otherwise inscrutable DNA left behind at a crime scene, often in minuscule amounts, and runs it through a software program that calculates how likely it is to have come from a particular person. One such program, TrueAllele, has been used in more than 850 criminal cases over the past 20 years. The problem? No one knows whether it works—the code, developed by a private company called Cybergenetics, is proprietary.
Government crime labs that use the software don’t get access to the program’s source code. Employees of Cybergenetics don’t get access. Even the authors of the peer-reviewed studies of TrueAllele have never had access to the code.
But now, two criminal cases—one in the U.S. District Court for the Western District of Pennsylvania and another in the Appellate Division of the Superior Court of New Jersey— may give the world a first peek into TrueAllele’s secretive algorithm. Last month, the New Jersey judge ordered prosecutors to hand over the source code for TrueAllele, and a few weeks later, the federal judge in Pennsylvania did the same.
Experts say the program is so complex and so hidden from scrutiny that software bugs are inevitable.
“It is virtually certain that there are flaws in the TrueAllele software,” wrote Mats Heimdahl and Jeanna Matthews, two computer science experts, in a declaration to the federal court. “On average, there will be six flaws for every 1,000 lines of code, and TrueAllele has 170,000 lines of code.”
While the orders do not make the source code publicly available, they allow the defense to bring in experts to examine the software for potential flaws and inconsistencies—under strict nondisclosure agreements. Should the experts find any problems, the defense attorneys could try to get the DNA evidence thrown out, weakening the cases against their clients and potentially leading to a larger reckoning for TrueAllele.
The software program’s workings are far more complex and difficult to reproduce than the traditional process of DNA testing, which matches a suspect to a robust type of evidence, like blood.
TrueAllele doesn’t match a suspect to physical evidence; it calculates the statistical likelihood that a person’s DNA is present in a complicated mixture of multiple people’s DNA, or in a minuscule amount of DNA left behind—for instance, after someone merely touched something.
In the Pennsylvania case, federal prosecutors used the software to parse a complicated mixture of DNA found on a handgun left in a car—and specifically, to calculate the likelihood that any of it belonged to a Pittsburgh man named Lafon Ellis. Cybergenetics experts testified that the sample likely contained DNA from four people and that it was “21.4 trillion times more probable” that Ellis’s DNA was in the mix than another, random African American person’s. Ellis’s attorney, however, says the gun isn’t Ellis’s and disputes the DNA evidence connecting him to it.
In addition to prosecutors, defense teams have also used the software to help prove their clients’ innocence at trial or exonerate them later. Cybergenetics has made the program available to either side in any case for testing.
According to Cybergenetics, TrueAllele’s calculations have already been admitted into evidence in 14 states, with 20 unsuccessful attempts in recent years by criminal defense teams to gain access to the program’s source code. Prosecutors have long argued that the program was built on widely accepted mathematical concepts and that opening up its source code to public scrutiny would threaten the company’s trade secrets.
But experts and civil rights advocates have long been concerned about the lack of scrutiny of the software. Their arguments may be gaining traction, as evidenced by the two rulings last month.
“Our justice system cannot permit convictions based on secret evidence,” wrote the Electronic Frontier Foundation and the American Civil Liberties Union of Pennsylvania in an amicus brief to the federal court. “There is a long history of junk science employed under the guise of technological advancement in criminal cases—and of public access to and analysis of such evidence as the means to its eventual invalidation.”
From bite-mark pattern evidence to blood-spatter analysis, and even ballistics testing and fingerprint matching—many forensic methods that were once commonplace in crime labs and courts have later been questioned or even totally debunked after facing outside scrutiny.
Life-or-death software programs—like those used in medical equipment or airplanes—must undergo independent validation and verification processes, but not so DNA software. There are few federal regulations about how police or crime labs introduce new technologies and methods into crime-solving. Every state has its own rules about how a new type of forensic technology gets approved for use, and judges tend to follow prior decisions when deciding whether or not to allow new scientific techniques into evidence.
Reviewing TrueAllele will likely be complicated—its inventor estimates doing so would take a person, reading 10 lines of code an hour, about eight and a half years to complete (which defense attorneys dispute)—and how to go about it has been another source of friction between prosecutors and defense attorneys.
Originally, defense teams in the two cases were surprised to learn their access to the source code would be restricted to a read-only iPad and that they would be allowed to use only a pen and paper to take notes. They have since lobbied for and won access to an electronic copy they can run and test themselves.
Though TrueAllele has never faced scrutiny like this before, two of its competitors have.
When a federal judge of the U.S. District Court for the Southern District of New York ordered source-code access to New York City’s proprietary DNA analysis software Forensic Statistical Tool (FST) in 2016, the examination revealed a serious flaw that “tend[ed] to overestimate the likelihood of guilt.” The judge in that case eventually lifted the protective order on the software, and ProPublica obtained the code and published it on GitHub, allowing researchers and the general public to scrutinize it themselves.
The New York City Office of Chief Medical Examiner replaced FST with STRmix, a widely used competitor to TrueAllele. It so happens that the makers of STRmix have also acknowledged software bugs that affected 60 cases in Australia.
“Software errors are common and forensic software has no special immunity from the bugs and mistakes that plague software in other fields,” wrote Khasha Attaran, the assistant federal public defender representing Ellis, in an emailed statement. “Constitutional principles and fairness demand access because an accused must be allowed to examine and challenge the accuracy of software generated evidence relied on by the federal government.”
The federal prosecutors may appeal the decision to the U.S. Court of Appeals for the Third Circuit but haven’t indicated yet whether or not they will. The New Jersey case may head to the state’s Supreme Court. The U.S. Attorney’s office prosecuting the Ellis case for the Western District of Pennsylvania and the Hudson County Prosecutor’s Office, which brought the case in New Jersey, both declined to comment on ongoing cases, as did Mark Perlin, the founder of Cybergenetics and TrueAllele’s inventor.
Correction
This story previously stated that a federal judge ordered disclosure of FST's source code in 2017. The judge issued the order in 2016.