AUG 20, 2014 07:00 AM PDT

Human Genome Analysis

12 54 1948

  • Albert L. Williams Professor of Biomedical Informatics, Co-Director, Yale Computational biology and Bioinformatics Program, Yale University

      Mark Gerstein is the Albert L Williams professor of Biomedical Informatics at Yale University. He is co-director the Yale Computational Biology and Bioinformatics Program, and has appointments in the Department of Molecular Biophysics and Biochemistry and the Department of Computer Science. He received his AB in physics summa cum laude from Harvard College and his PhD in chemistry from Cambridge. He did post-doctoral work at Stanford and took up his post at Yale in early 1997. Since then he has published appreciably in scientific journals. He has >400 publications in total, with a number of them in prominent journals, such as Science, Nature, and Scientific American. (His current publication list is at .) His research is focused on bioinformatics, and he is particularly interested in large-scale integrative surveys, biological database design, macromolecular geometry, molecular simulation, human genome annotation, gene expression analysis, and data mining. 


    The ENCODE and modENCODE consortia have generated a resource containing large amounts of transcriptomic data, extensive mapping of chromatin states, as well as the binding locations of >300 transcription factors (TFs) for human, worm and fly. We performed extensive data integration by constructing genome-wide co-expression networks and transcriptional regulatory models, revealing fundamental principles of transcription conserved across the three highly divergent animals.
    In particular, we found the gene expression levels in the organisms, both coding and non-coding, can be predicted consistently based on their upstream histone marks. In fact, a "universal model" with a single set of cross-organism parameters can predict expression level for both protein coding genes and ncRNAs. Carrying out the same type of "predictions" for TFs, we found that information in their binding is more localized to near the TSS region than that of histone marks but is largely redundant with that of the marks.
    Surprisingly, only a small number of TFs are necessary in the models to successfully predict expression (e.g. ~5 of the >1000 in human).

    Show Resources
    Loading Comments...