We introduce a differential abundance analysis method for the analysis of sparse high-throughput data from large-scale surveys of marker genes for microbial communities. Our approach relies on cumulative sum scaling (CSS) normalization - a count data normalization technique - and the zero-inflated Gaussian (ZIG) model as a statistical method for detecting differential abundance of taxonomic features. ZIG differential abundance detection method accounts for bias introduced by the under-sampling of microbial communities commonly found in large-scale marker gene studies. We have implemented these methods in the publicly available metagenomeSeq bioconductor package. In addition we highlight the utility of the method in a large scale study characterizing the diarrheal microbiome in young children from developing children. Diarrhea, a major cause of mortality and morbidity in young children from developing countries, leading to as many as 15% of all deaths in children under 5 years of age. While many causes of this disease are already known, conventional diagnostic approaches fail to detect a pathogen in up to 60% of diarrheal cases. Using our novel methodology Streptococci were found in our study to be statistically associated with diarrheal disease in general and more severe forms (such as dysentery) in particular.
Research And Development