While the human genome has been sequenced, there are plenty of parts of it that we don't know much about. The known coding regions of the genome hold the instructions for making around 20,000 proteins, which are critical to the structure and function of the body, and some of those genes have been studied at length. But there are also many more very small coding genes tucked away in DNA. They can be hard to find, and little is known about them.
Now, researchers at the Salk Institute have found about 2,000 small genes, known as smORFs (small open reading frames) that code for microproteins. The methodology they developed to find these genes can now be used to find more of them. The work has been reported in Nature Chemical Biology.
"We've expanded the human genome," said the co-corresponding author of the study, Salk Professor Alan Saghatelian. "This work can really be applied to better understand human biology and may eventually have implications for diseases ranging from cancer to diabetes."
Saghatelian and colleagues have spent about ten years creating ways to identify smORFs and the microproteins they code for that affect human health. Microproteins have already been connected to cell stress, immune function and the development of muscles. These microproteins could offer new ways to develop drugs and biomarkers for disease, noted Saghatelian.
A tool called Ribo-Seq was used by Saghatelian lab postdoctoral fellow Thomas Martinez, the first author of the study, to find smORFS that code for cellular proteins. While this technique is often used to detect larger proteins, it wasn't as good at finding small ones, until the researchers optimized it. These efforts led to the identification of around 7,500 smORFs in one type of cells growing in culture. Of those, about 1,500 were also found in two other cell lines. The researchers repeated their search to confirm that these genes were coding for real microproteins.
"We finally have reliable information that the human genome contains at least 2,500 to 3,500 smORFs," said Saghatelian.
Now the researchers want to know which of them are related to disease, and whether they could act as therapeutic targets. "Right now, our methods can tell us if a smORF exists or doesn't exist, but it doesn't give us a lot of information on what is actually related to disease," added Saghatelian. "Going forward, the lab will start doing more research to find smORFs that may be specific to diseases like cancer or diabetes."
This work is just getting started, and the investigators are hopeful that other labs will look for more smORFs in other types of cells.
"This is really an unexplored area," said Martinez. "At the end of the day, you want to know what all the parts are in the genome."