When geneticists began to look for errors in the human genome that led to a disease, there were many diseases that traced back to a problem with a single gene. But genetic research has moved beyond that phase of study now that many of those disease-causing mutations have been identified. Now researchers want to know more about complex diseases, which are driven by many different and often minor changes in various genes, or disease risk, which can be raised by small variations in gene sequences that are not damaging on their own.
To learn more about complex genetic risk factors and diseases, scientists have used genome-wide association studies (GWAS) to sift through the 3.2 billion base pairs in the human genome. That has allowed them to find a few needles in the haystack of our genetic material. GWAS is outlined in the video at the bottom of the article by Oxford University Press.
"I view a GWAS as a way to reduce the size of the haystack into genomic regions that potentially could contain causal mutations underlying a trait," explained Alex Lipka, assistant professor of biometry in the Department of Crop Sciences at the University of Illinois.
GWAS uses computational tools that look for statistically significant variations in the genome. These variations mark the locations in the genome with the highest likelihood of being associated with a particular trait of interest, like high blood pressure, for example. Certain parts of the genome that show an association can then be investigated in depth.
While this has been a useful technique, Lipka noted that it can fail to detect genes that only have a minor contribution, or interactions between genes that produce an effect, called epistasis. These genetic features may make a critical contribution that gets overlooked by GWAS.
Learn more about the genetics of complex traits from the video.
"The state-of-the-art statistical approach for GWAS is to test one marker at a time for the strength of its association with the trait," he said. "If you think about the true genetic underpinnings of a trait, it's not just one gene controlling things. Multiple genes contribute to phenotypic variation in an additive manner and are epistatically interacting with one another. What we try to do in our study is: explore the use of a statistical approach that is more biologically accurate. Not only are we finding statistical models that include multiple markers at a time; we also find multiple two-way interaction effects at a time."
The scientists developed a method called SPAEML, reported in the Nature journal Heredity, and assessed whether it could sense the underlying causes of simulated traits that had molecular sources that were similar to Alzheimer’s disease in the human genome and flower structure in the corn genome. We already know a bit about the genetics behind these traits, so it was a way to test the technique. They built custom software that is freely available and utilized computers at the National Center for Supercomputing Applications.
"In both the human and corn datasets, we were able to identify our simulated markers," Lipka revealed. "And in the human dataset, we were able to distinguish between additive and interacting loci."
Unfortunately, we haven’t yet learned anything new about human disease, including Alzheimer’s, because SPAEML was tested using knowledge that already exists. However, it shows that the approach can work to find genetic features that contribute to human disease, even in minor ways. Many of those small markers can add up in a person, and cause a huge shift in their risk for some disease.
While geneticists have known that complex traits are under the control of several genes, maybe many genes, we’ve lacked the computational tools to test how multiple markers or genes interact.
"The problem is the combinatorial explosion of possibilities that must be tested because we're looking at pairs of markers," explained co-author Liudmila Mainzer, technical program manager for Genomics at NCSA. "The algorithm needs to evaluate tens of thousands, hundreds of thousands, possibly millions of models in order to select the best one. It could take years in sheer computational time, which is why no one has ever done it."
The researchers are now planning to use SPAEML to learn more about the genetics of human disease, and have enlisted collaborators in the effort.
"This research is really hard, but it's the right way to approach this scientific problem. With access to supercomputing resources, outstanding students, and a bit of our own youthful foolhardiness - who knows, we might just manage it," Mainzer joked. "Based on the feedback we've had so far, it has been very rewarding,"