Most genes code for proteins, which make up cellular structures and carry out the vast majority of cellular functions. First, the cell makes genes into an intermediary molecule, RNA, which is used as a kind of template for the protein. But there are some regions of the genome that don’t code for protein. Instead, they generate small pieces of RNA that are only 18 to 25 nucleotide bases in length - microRNAs - instead of the hundreds or typically, thousands of nucleotide bases that comprise genes. These microRNAs can control the expression of over 60 percent of our genes, dramatically influencing cellular behavior.
In the lab of Dr. Sharon E. Plon, graduate student Ninad Oak wanted to take a closer look at non-coding regions of the genome, or more specifically, changes in microRNAs. The Plon lab is especially interested in identifying variations in the genome that can increase the risk of cancer.
"I started the project thinking that we had focused on protein-coding regions for a long time. But they only represent one percent of the genome, so I thought that by looking at the remaining 99 percent we might find variations we have been missing that might explain some undiagnosed patient cases," Oak said. "Although the amount of microRNA that is found in the cell is often studied in human disease, microRNA variations that are associated with disease are understudied," Oak added.
Disruption in microRNA function can interfere with the body's careful control of gene expression and has been connected to a variety of health problems including heart disease, developmental disorders, and cancer.
"When he presented this proposal, I thought it was a good idea," said Plon, a professor of pediatrics - oncology and molecular and human genetics at Baylor and director of the Cancer Genetics Clinical and Research Programs at Texas Children's Hospital. She is featured discussing her research in the video.
Oak crafted a computational tool to analyze variations in human microRNA, called ADmiRE, which stands for Annotative Database for miRNA Elements. The idea was to find the ones that were most likely influencing disease.
"There were multiple challenges when I started working on this project," Oak said. "Most datasets of genomic sequencing are of whole exome sequencing (WES), which captures only protein-coding regions. So first, I looked at how well WES datasets captured microRNAs and found that they captured about 50 percent."
Annotation tools already exist so that researchers can add information about the microRNAs into databases. But Oak had to figure out how good the annotations were.
"There are various annotation tools that identify where a mutation is in general in the genome, not exclusively in microRNA. I found that these tools didn't annotate microRNA accurately; they tended to favor the potential change to a protein-coding gene and not the impact on microRNAs. These tools also didn't include comprehensive information that would help us interpret and prioritize the potential role of that microRNA in disease," Oak noted.
Oak worked to correctly annotate microRNA variations with a new method, and then applied it to a massive WES dataset of adults (gnomAD). Then, a baseline of microRNA variation in a normal human population could be set.
"This approach allowed us to draw conclusions about how frequently microRNAs are variable in normal datasets," Oak said. "Knowing the background variation would help us identify potential microRNA variants in disease states."
Oak created a measurement called allele frequency percentile score; it shows how much a microRNA varies compared to other microRNAs in the datasets and may indicate which ones are most likely to be linked to disease. Next, a group of microRNAs with the lowest level of variation was highlighted - in the lower quartile of scores. That group is highly conserved in adults - meaning there is very little change in them because their function is so crucial, and when they do change, it’s probably causing a problem.
Using the new tool, the researchers assessed mutations found in 10,000 different cases of cancer. That included 32 kinds of cancer found in the Cancer Genome Atlas' PanCancer Atlas (which is discussed in the following video).
"We found miR-142 mutations linked to hematologic cancers, confirming the finding made a few years ago. Also, we found microRNA mutations in miR-21, which had not been previously associated with cancer. Our analysis with ADmiRE suggests that these mutations may contribute to mechanisms involved in esophageal cancer.
"At a personal level, I found this work very satisfying because I think it contributes a new technique to our lab that fills a gap in the field," Oak said. "From the scientific point of view, ADmiRE offers a new resource for researchers who have not found a genetic cause for a disease in protein-coding genes. We have made this tool publicly available, and researchers can apply it to determine whether there is a signal in miRNA sequences. Maybe down the line, this tool could be used by clinical laboratories."
"I think it is an important tool," Plon added. "Mutations in microRNA have been missed for many years, but I think ADmiRE will now allow labs that have mutation data to see if these mutations that we know are important play a role in the biology of human health."