MAY 10, 2017 10:30 AM PDT

Methods to account for sequencing artifacts in large high-throughput RNA-Seq data

C.E. CREDITS: CEU | P.A.C.E. CE | Florida CE
  • Research Fellow, Dana-Farber Cancer Institute
      I am a Research Fellow in the Department of Biostatistics and Computational Biology at the Dana-Farber Cancer Institute and Department of Biostatistics at the Harvard TH Chan School of Public Health under the guidance of Professor John Quackenbush. Prior to joining Harvard I was a National Science Foundation Graduate Research Fellow at the University of Maryland, College Park where I received my Ph.D. in Applied Mathematics, Statistics and Scientific Computation.
      As a computer scientist and computational biologist, my interests are to develop computational methods for the analysis of high-throughput sequencing data. I also desire to develop software and support these methods as open-source software for the broader scientific community through Bioconductor and popular domain tools such as QIIME and Phyloseq. MetagenomeSeq, is my most popular tool developed and is in the top 5% of all Bioconductor packages downloaded in the last year with over 5,000 unique users. I am excited to leverage statistical and network methodologies in accounting for technological when identifying disease markers.


    Although ultrahigh-throughput RNA-sequencing has become the dominant technology for genome-wide transcriptional profiling, the vast majority of RNA-seq studies typically profile only tens of samples, and most analytical pipelines are optimized for these smaller studies. However, projects are generating ever-larger data sets comprising RNA-seq data from hundreds or thousands of samples, often collected at multiple locations and from diverse tissues. We examine the effects of different preprocessing methods on downstream analyses. We find analysis of large RNA-seq data sets requires careful quality control and that one account for sparsity due to the heterogeneity intrinsic in multi-group studies. We motivate our results using the GTEx cohort and look at the differential pathways of cell lines from their progenitor tissues.

    Show Resources
    Loading Comments...