While the human genome was sequenced many years ago, there was and still is so much to learn about it beyond just its sequence, like how its structure influences gene expression, how it's affected by chemical tags that can be added to or removed from DNA, or what the portions of the genome that do not code for protein are supposed to do. We now know that around 98 percent of the human genome does not code for protein. But scientists are starting to learn more about these non-coding regions, which were once written off as junk.
Reporting in Nature, researchers have now created EpiMap (Epigenome Integration across Multiple Annotation Projects). This catalog of epigenetic marks can show what genes are active or inactive in different cell types and was created using an analysis of 833 tissues and cell types. The scientists were able to identify groups of regulatory features that influence biological processes and have proposed potential impacts for around 30,000 genetic variants that have been associated with 540 traits.
"What we're delivering is really the circuitry of the human genome. Twenty years later, we not only have the genes, we not only have the noncoding annotations, but we have the modules, the upstream regulators, the downstream targets, the disease variants, and the interpretation of these disease variants," said the senior study author Manolis Kellis, a professor of computer science, a member of MIT's Computer Science and Artificial Intelligence Laboratory and of the Broad Institute of MIT and Harvard.
Some of the known epigenetic marks are histone modifications, methyl groups, or the accessibility of regions of DNA (based on how the DNA is organized in three dimensions in the nucleus of a cell). Epigenetic features help control gene activity.
"Epigenomics directly reads the marks used by our cells to remember what to turn on and what to turn off in every cell type, and in every tissue of our body. They act as post-it notes, highlighters, and underlining," Kellis explained. "Epigenomics allows us to peek at what each cell marked as important in every cell type, and thus understand how the genome actually functions."
The genes that are active in a cell determine its role and identity, so mapping epigenetic marks can reveal more about how the biology of a cell is controlled. Some epigenetic features can promote gene activity and are known as enhancers while others known as repressors can reduce gene activity.
In this work, the researchers analyzed 833 samples that represented a diverse group of tissue types and integrated available data into their maps. Over 2 million enhancer sites have now been annotated, representing around 13 percent of the genome. The researchers grouped them based on their patterns of activity, and connected them to pathways they influence, their regulators, and the sequences that mediate this activity. The scientists also predicted the about 3 million links between genes and their control elements, which creates the most current and comprehensive map of human gene circuits.
This research may help scientists learn more about the effect of genetic variants that have been linked to diseases, but which sit in regions of DNA that do not code for protein. Around 93 percent of the disease-associated genetic variants that have been identified by genome-wide association studies (GWAS) sit in noncoding regions. We still know very little about the collective effect of all these noncoding regions in different tissues.
This study can help us begin to learn about the effects of these variants; already, the researchers have provided mechanistic possibilities for over 30,000 noncoding GWAS variants, and suggested that some characteristics or diseases are influenced by enhancers that affect many kinds of tissue.
"We hope that our predictions will be used broadly in industry and in academia to help elucidate genetic variants and their mechanisms of action, help target therapies to the most promising targets, and help accelerate drug development for many disorders," concluded Kellis.