If you are looking to learn more about your ancestry, you may have noticed the commercials for DNA test kits that promise to tell you where your family comes from. Through these DNA tests, you can connect with family members you didn’t know you had, learn more about your cultural background, or sometimes even find out if you’re related to someone famous.
But how do they work? And how do researchers measure DNA to provide answers about your ancestry?
It all starts with a DNA test kit. Different organizations have their own types, but it usually consists of a saliva collection kit, in which you spit into a tiny test tube filled with solution. You then mail the test tube to a lab.
Lab technicians extract the DNA from the tube and measure specific positions within the whole set of genetic instructions, often referred to as the human genome, using a small rectangular chip called a microarray. This tool can measure millions of different locations within the genome at the same time. Human DNA is nearly identical from person to person, but there are small differences unique to each individual that are tied to health conditions, ancestry and other information. These differences are called variants and are what scientists are measuring to determine ancestry.
To measure these variants, chemical enzymes separate the double-stranded DNA from your sample into single strands and then cut up the DNA into smaller fragments. These DNA fragments then enter the microarray.
The DNA microarray contains thousands of spots with single DNA strands; some are snippets of the normal gene and others are the genetic variants of interest. DNA from your sample binds to this DNA on the chip. If you have a specific variant, your DNA will bind to the spot on the chip with that variant.
The company that provides the DNA service then compares your data to reference panels, which contain the variants associated with different ancestries. If a certain variant is found only in Europeans or Africans and the laboratory finds that variant, then you most likely have ancestors in that geographical region. DNA ancestry companies will go through each part of your genome that they test to detect which variants are present and compile a report showing the percentages of ancestors from each geographical region.
Each DNA ancestry company has its own process for developing these reports. Usually, it involves a proprietary ancestry algorithm that analyzes datasets of genetic information, which can include the reference panels, public databases such as the 1000 Genomes Project (funded in part by the National Institutes of Health), and the company’s own customer base.
The ancestry info found in the report is only as good as the reference panel that researchers use. The variants identified have been measured in numerous individuals whose ancestry extends back to multiple generations in a specific geographic location, such as Africa, Europe, Asia or South America. In addition, the reference panels have thousands of genomes representing these individuals with different ancestral backgrounds. When this technology first became available in the early 2000s, some reference panels were incomplete, which meant companies couldn’t test for or identify certain heritages.
However, since then, measurements have improved thanks to companies collecting DNA data from more individuals with different genetic ancestries as well as the availability of standards, such as DNA reference materials, to which researchers can compare their results.
These reference materials can help ancestry DNA companies benchmark their methods. The National Institute of Standards and Technology (NIST) provides four different human genome reference materials (RM 8391 Human DNA for Son of Eastern European Ashkenazi Jewish Ancestry, RM 8392 Human DNA for Family Trio of Eastern European Ashkenazi Jewish Ancestry, RM 8393 Human DNA for Son of Chinese Ancestry, RM 8398 Human DNA for Daughter of Utah/European Ancestry) that could help check the accuracy of genomic measurements when using the microarrays.
NIST is also working with NIH on the Human Pangenome Reference Consortium, whose goal is to characterize genomes of diverse ancestries. Though these datasets are different than the reference panels created by ancestry DNA companies, they may potentially be helpful in benchmarking the companies’ methods and improving their proprietary datasets.
If you have taken these tests, you may see the percentages in your ancestry report fluctuate over time, or if you’ve taken multiple DNA tests with different companies you may see slight differences in the report numbers. This could be due to differences in each company’s methods, the continued growth and improvement of the reference datasets, and other factors. Keep in mind that the field of DNA ancestry is new and still developing, and companies are working with the best algorithms and data they have at any given time.
It’s important to note that there is a difference between ancestry and race. Genetic ancestry is based upon a person’s DNA, which can be traced back to the genetic sequences of their ancestors. So, if a person’s ancestry report says they are 34% East Asian, then that means that person has genetic DNA most similar to the population of many of the people living in that geographic location now. Race, on the other hand, is a social construct that can consist of multiple factors such as cultural characteristics (language, customs and religion), skin color or physical characteristics. The two terms shouldn’t be confused with each other.
Once upon a time not too long ago, we could only learn our ancestry through family lore or old government records. Nowadays, we can find answers within our DNA — thanks in large part to measurement science.