MAY 13, 2015 10:30 AM PDT

Transcriptome Assembly: Computational Challenges of Next-Generation Sequence Data

  • Director, Center for Computational Biology McKusick-Nathans Institute of Genetic Medicine, Professor, Departments of Biomedical Engineering, Computer Science, and Biostatistics
      Steven Salzberg is a Professor of Biomedical Engineering, Computer Science, and Biostatistics and the Director of the Center for Computational Biology in the McKusick-Nathans Institute of Genetic Medicine at Johns Hopkins University. He earned his B.A. and M.S. degrees from Yale University, and his Ph.D. from Harvard University. From 1997-2005 he was Senior Director of Bioinformatics at The Institute for Genomic Research (TIGR) in Rockville, Maryland. From 2005-2011, he was the Director of the Center for Bioinformatics and Computational Biology (CBCB) and the Horvitz Professor of Computer Science at the University of Maryland, College Park.

      Dr. Salzberg's interest in the human genome project motivated him to develop one of the first computational gene-finding systems for the human genome in the early 1990s. His initial collaborations with TIGR at that time led to the development of a gene-finding program, Glimmer, that has been used in the analysis of thousands of microbial genomes, including Borrelia burgdorferi, Mycobacterium tuberculosis, Vibrio cholerae, Bacillus anthracis, and many others. He was a co-founder of the Influenza Genome Sequencing Project, the first large-scale genomics study of human and avian influenza viruses. His current work focuses on algorithms for genome assembly and alignment, particularly emphasizing next-generation sequencing data. In recent years his group developed the Bowtie, TopHat, Cufflinks, and StringTie software for alignment of next-gen sequences from re-sequencing and RNA-seq experiments. All of his group's software is free and open source.

      Dr. Salzberg has authored or co-authored two books and over 200 publications in leading scientific journals. He was the 2013 winner of the Benjamin Franklin Award for Open Access in the Life Sciences, and the 2013 winner of the Robert G. Balles Prize in Critical Thinking for his Forbes science blog. In 2001 and again in 2014 he was listed as a Highly Cited Researcher by Thomson Reuters, a compilation of the 1% most-cited researchers in the world; his H-index is 110. He is a Fellow of the American Association for the Advancement of Science and of the International Society for Computational Biology.

      Salzberg Lab:


    Next-generation sequencing technology allows us to peer inside the cell in exquisite detail, revealing new insights into biology, evolution, and disease that would have been impossible to discover just a few years ago. The enormous volumes of data produced by NGS experiments present many computational challenges that we are working to address. One of the most widely used sequencing methods is RNA-seq, which captures the genes being transcribed in a cell and uses sequencing to measure their levels of expression. In recent years, my lab has developed multiple systems for RNA-seq analysis, including the widely-used Bowtie, TopHat and Cufflinks programs for alignment and assembly of transcripts from RNA-seq data. In this presentation, I will describe two new systems, each of which represents a major step forward: (1) the HISAT system for spliced aligment of NGS reads, a successor to TopHat; and (2) the StringTie program for assembly and quantitation of RNA-seq data, a successor to Cufflinks. This talk describes joint work with Daehwan Kim and Mihaela Pertea. Learning Objectives: 1. Explain the overall process used to turn RNA sequence data into a summary of genes and their expression levels 2. Describe why it is difficult to align a short RNA or DNA sequence to the human genome 3. Appreciate the computational challenge of assembling a complete, correct set of transcripts from a large next-generation sequencing experiment.

    Show Resources
    Loading Comments...