Applications of Machine Learning to Predict Clinical Provenance of Haemophilus influenzae

C.E. Credits: P.A.C.E. CE Florida CE
Speaker

Abstract

Background: Haemophilus influenzae is the causative agent of multiple human disease conditions among multiple sites in the human body. Underlying genetic mechanisms are elusive, particularly in species with diverse ecological niches in the human body.  Our lab and others have sequenced the whole genomes of over 1,600 genomes of Non-typeable Haemophilus influenzae (NTHi).  These strains were isolated from human patients with various disease states (as well as colonizing without disease symptoms) from multiple locations within and on subject’s bodies.

Methods: 1,618 genomes were assembled using sequencer-appropriate assembly software. Automatic gene annotation was performed using Prokka, and pan-genome gene cluster analysis was performed with Roary. Gene presence/absence matrix of 4,207 gene clusters were used as gene ‘features’ to predict isolates from A) Body Site of isolation, and B) Disease State of patient. Additionally, ‘core’ genes (genes present in all NTHi strains) were converted to numeric vectors and used as a separate feature set. Three algorithms used for class prediction were explored, among the two feature sets.

Results: Imbalance in the number of classes within the dataset proved challenging for the machine learning (ML) algorithm predictions. All algorithms performed significantly better than ‘No Information Criteria’ in predicting either body site and disease state, though in all cases predicting fewer, and more balanced classes was correlated with higher accuracy.

Conclusion: Both gene presence/absence, and core gene genetic composition information in NTHi strains can successfully be used to predict both ecological niche and disease state of origin. Future work is warranted, specifically increasing the number of genomes in classes with low representation and exploring additional methods and feature selection techniques.

Learning Objectives:

1. Define horizontal gene transfer in naturally competent bacterial species

2. Identify importance of gene possession to phenotype

3. Explain current approaches to use gene sets to predict clinical provenance


Show Resources
You May Also Like
JUN 28, 2022 7:00 AM PDT
JUN 28, 2022 7:00 AM PDT
Date: June 28, 2022 Time: 3:00pm (BST), 4:00pm (CET), 9:00am (CST), 7am (PST) Light-sheet microscopy is an extremely versatile imaging technique with a vast range of implementations that are...
MAR 16, 2022 8:00 AM PDT
C.E. CREDITS
MAR 16, 2022 8:00 AM PDT
Date: March 16, 2022 Time: 8:00am (PDT), 11:00am (EDT), 5:00pm (CET) Handling of potent and/or hazardous substances is commonplace in sev.....
MAY 17, 2022 9:00 AM PDT
MAY 17, 2022 9:00 AM PDT
Date: May 17, 2022 Time: 9:00am (PDT), 12:00pm (EDT), 8:00pm (CEST) Gene therapeutics have great potential to treat many severe diseases in an unprecedented, targeted manner. The biopharmace...
FEB 24, 2022 10:00 AM PST
C.E. CREDITS
FEB 24, 2022 10:00 AM PST
Date: February 24, 2021 Time: 10:00am (PST), 1:00pm (EST) One of the largest global public health crises is the rise of antimicrobial-resistant infections. Globally, over 700,000 people die...
JUN 21, 2022 6:00 AM PDT
JUN 21, 2022 6:00 AM PDT
Date: June 21, 2022 Time: 6:00am (PDT), 9:00am (EDT), 3:00pm (CEST) The global understanding and practice of medicine is currently undergoing a revolutionary change. This shift to precision...
MAR 23, 2022 11:00 AM PDT
MAR 23, 2022 11:00 AM PDT
Date: March 23, 2021 Time: 11:00am (PDT), 2:00pm (EDT), 8:00pm (CEDT) In this presentation, Dr. Middleton will review the development and deployment of large-scale saliva-based COVID-19 test...
Loading Comments...
Show Resources
Attendees