In the previous session we saw how it is possible to arrive at frequencies as small as 1 in a trillion or less, starting with a sample of the population of as few as 500 individuals. The question inevitably arises--where does this sample come from and what population(s) should be used?
One convenient source of the samples is blood banks. One well-known company that performs forensic DNA analyses obtained its samples from the Detroit blood bank, recording only the ethnic status of the donor. An early sample used by the FBI consisted of blood samples drawn from FBI agents. The FBI was criticized unjustly for this choice, as there is no reason to believe that FBI agents would have a different pattern or range of variation for VNTR and STR loci, as the general population.
From what populations should the samples be taken? Long before the discovery of VNTR and STR loci it was known that small differences exist between ethnic and population groups in the frequencies of genes controlling blood types, as well as other loci. Not long after their discovery, VNTR and STR loci were shown to be no exception to this rule. On the surface, this perhaps is not a serious issue--all that is needed is a sample from the appropriate ethnic or population group. But a moment's reflection makes us realize that this is a considerably more complex issue.
Suppose that the defendant in a trial is a White male from western Michigan, with a Dutch last name. (A large proportion of the residents in that part of the state can trace their ancestry back to Dutch immigrants.) The question therefore arises--should we use a sample of White Americans to obtain our frequency date or would it be more accurate to use a sample from the Netherlands? What if the defendant claims ancestors from Friesland in the northern part of the Netherlands; should we use a sample originating from that part of the country? If the defendant pleads "not guilty," should we bother with a Dutch sample or even a sample of White Americans? Perhaps we should consider samples of Black or Asian Americans as well.
This issue figured prominently in a pretrial hearing for a homicide case in Vermont in the early nineties. The accused presented evidence that he was almost one half Abnaki Indian and half Italian. The defense argued that DNA evidence should not be admissible in this case as there was no suitable reference database. The judge ruled in favor of the defense, saying "it is unclear which if any of the FBI databases is appropriate for calculating the probability of a coincidental match." However the defense's argument--and the judges ruling--was quite illogical, as the defense, by entering a plea of not guilty, claimed that the forensic sample did not originate with the accused. Therefore his ethnic background should have been irrelevant!
A committee of the National Academy of Sciences first recommended that this problem be addressed by calculating the frequency of a DNA profile in the three major ethnic groups in the country--Blacks, Hispanic-surnamed and Whites--and the largest of the values be presented to the court. This so-called "ceiling" principle was employed for several years in many court cases throughout the country. This practice was criticized however on the grounds that presenting a jury with only one number is restricting the information available to the court. The recommendation was amended by a later committee, who recommended the current practice of presenting the court with all the values for all three (or more) ethnic groups.
Variation between ethnic and population groups was more of an issue in the early days of the presentation of DNA data in the courtroom, when the number of VNTR loci used was smaller than today (four and sometimes even less), and the frequencies were correspondingly larger. Today it is commonplace to use six or more loci (10 were used in the murder trial against former American football player O.J. Simpson), so the frequencies of the profiles are much smaller. Thus, the outcome of a case is seldom going to turn on the difference between, say 1 in 5.5 trillion and 1 in 3 trillion, when the world population is only a little more than 6 billion.