SEP 11, 2019 1:30 PM PDT

Development of Machine Learning Algorithms to Data Mine Meta-Omic Characterizations of Complex Microbiota

Sponsored by: PacBio
Speaker

Abstract

Background:  The vast majority of all genes are contained within the genomes of the prokaryotes, including the eubacteria and the archaea.  These largely single-cellular domains of life thus contain most of the metabolic machinery housed within the earth’s biosphere.  The gene systems that encode this machinery include entire pathways for the biosynthesis and catabolism of literally millions of natural products.  A small subset of species within the eubacteria are also associated with disease as pathogens, and these organisms produce a specialized set of secondary metabolic products termed virulence factors.  The vast majority of all prokaryotic genes are either unannotated or under-annotated with respect to the functions of their encoded proteins.  

Rationale:  To develop computational means to identify the specific genes, and the metabolic pathways that they encode, that underlie traits of interest for the manipulation of prokaryotic physiology to improve human life and health.

Specific Aim of the current research: To develop generic unbiased computational means to identify unannotated bacterial genes associated with pathogenesis, virulence and tissue tropism.

Results:  Our initial tools for the identification of novel bacterial virulence genes were adopted from the statistical genetic approaches used in eukaryotic gene mapping.  Following the statistical identification of our first set of candidate unannotated virulence genes from the human obligate pathogen Haemophilus influenzae, we demonstrated using a combination of in vitro and in vivo animal model experiments that the identified genes’ cognate proteins were actual virulence factors.  Follow-on studies of one of these novel proteins, Msf1, provided mechanistic details regarding its mode of action.   Subsequently, we developed random forest and neural network-based machine-learning approaches for a more thorough search of H. influenzae’s virulence/tropism genes.  Through multiple rounds of parameter tuning we developed a highly reliable random forest program that provided greater than 85% specificity with regard to determining the actual disease (out of five diseases) from which a given bacterial strain was isolated.  Examination of the random forest’s classifier gene selection provides a rich source of novel unannotated genes from within the microbial genomic dark matter that will provide for a focused approach for the characterization of much new biology relating to pathogenesis.  It is interesting to note that four of the top five genes used by the classifier have no annotation whatsoever.  Using a second neural net approach, in this case for protein annotation we have been able to assign, with a high degree of confidence, at least one GO (gene ontology) term for 14% of the 13,692 hypothetical proteins encoded by the Moraxella catarrhalis pan (supra) genome.

Discussion:  Through the combination of multiple machine learning algorithms we have developed the beginnings of a pipeline for the in silico identification and characterization of novel unannotated genes.

Conclusions:  The development of high-throughput whole genome sequencing together with the creation of a suite of unbiased methods for the identification and characterization of unannotated prokaryotic genes that are associated with specific measurable traits will provide a universal method for targeted gene characterization leading to the discovery of novel biology underlying any metabolic process of interest.

Learning Objectives:

1. Understand that even in this day of massively high throughput whole genome sequencing that the vast majority of prokaryotic genes and gene systems are completely unannotated meaning that we have no idea what the genes that we sequence encode.
2. Through the application of machine-learning (artificial intelligence) approaches we are beginning the development of a computational pipeline that can be used to: (a) identify the genes involved in a particular process; and b) then annotate the identified genes as to likely molecular function.

 


Show Resources
You May Also Like
MAY 11, 2021 10:00 AM PDT
C.E. CREDITS
MAY 11, 2021 10:00 AM PDT
Date: May 11, 2021 Time: 10:00zm PDT Your samples are some of the most valuable assets in the laboratory. After spending countless hours on extraction and preparation, your conclusions could...
SEP 14, 2021 7:00 AM PDT
C.E. CREDITS
SEP 14, 2021 7:00 AM PDT
Date: September 14, 2021 Time: 7am PDT, 10am EDT, 4pm CEST A conventional thermal cycler has long been a commodity product in the lab and end-point PCR techniques can be completed almost wit...
JUN 09, 2021 7:00 AM PDT
C.E. CREDITS
JUN 09, 2021 7:00 AM PDT
Date: June 9, 2021 Time: 09 June 2021, 7am PDT, 10am EDT, 4pm CEST cells with dramatic implications on the validity of past cell culture related research. The fact that at least 509 cell lin...
JUN 22, 2021 10:00 AM PDT
C.E. CREDITS
JUN 22, 2021 10:00 AM PDT
Date: June 22, 2021 Time: 10:00am (PDT), 1:00pm (EDT) Antimicrobial resistance (AMR) has emerged as one of the principal public health problems of the 21st century. It threatens the effectiv...
JUL 15, 2021 9:00 AM PDT
JUL 15, 2021 9:00 AM PDT
Date: July 15, 2021 Time: 9:00am (PDT), 12:00pm (EDT) The Pisces workflow robust, easy-to-use, end-to-end multi-omics solution for highly multiplexed targeted Spatial RNA analysis. VeranomeB...
OCT 20, 2021 10:00 AM PDT
C.E. CREDITS
OCT 20, 2021 10:00 AM PDT
Date: October 20, 2021 Time:10:00am (PDT), 1:00pm (EDT) As the prevalence of Diabetes continues to rise in many areas across the globe, healthcare providers continue to look for methods that...
SEP 11, 2019 1:30 PM PDT

Development of Machine Learning Algorithms to Data Mine Meta-Omic Characterizations of Complex Microbiota

Sponsored by: PacBio

Specialty

Research

Biomarkers

Research And Development

Animal Research

Protein

Cancer

Animal Models

T-Cells

Pcr/rt-Pcr/real-Time Pcr

Dna

Clinical Diagnostics

Molecular Genetics

Drug Development

Gene Expression

Biotechnology

Geography

Asia67%

Europe22%

Africa11%

Registration Source

Website Visitors100%

Job Title

Educator/Faculty38%

Student25%

Medical Laboratory Technician13%

Research Scientist13%

Scientist13%

Organization

Academic Institution56%

Pharmaceutical Company22%

Manufacturer - Other11%

Government11%


Show Resources
Loading Comments...
Show Resources