PRINCESS: A Framework for Comprehensive Detection and Haplotype Phasing of SNPs and Structural Variants



Long-read DNA sequencing technologies such as the Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) platforms, have demonstrated enhanced detection of genomic variation, including Single Nucleotide Variants (SNVs), Structural Variants (SVs) and methylation changes. Individual studies so far have, however, have focused only on one of the three classes of variation: SNVs, SVs or methylation changes. Furthermore, only a few studies include phasing information to improve prediction of these classes of variation, to better associate genetic variation with phenotypes. Thus, clinical and research studies both currently lack a comprehensive view of genomic variation, even though the primary data is present in their DNA sequences. Here we introduce PRINCESS, a method that provides haplotype resolved SNVs, SVs and methylation changes based on a single long-read sequencing run from either PacBio or ONT. PRINCESS automatically adapts to different sequence coverage levels to optimally leverage the data set at hand. Thus, PRINCESS provides cost and time efficient comprehensive insights of haplotype resolved genomic variation. This information can be leveraged to simultaneous study the interaction of SNVs, SVs and methylation changes and their impact on phenotypic changes. PRINCESS was evaluated using Genome in a Bottle (GIAB) Oxford Nanopore standard and ultra-long reads as well as PacBio Continuous Long Reads (CLR) and Circular Consensus Sequencing (CCS) data. Using only one SMRT or PromethION flow cell Princess achieved high SNV precision (97.01%, 99.54%, 92.11%) and sensitivity (80.32%, 70.32%, 87.45%) for PacBio CLR, CCS and ONT PromethION, respectively, with minimum Genotype accuracy 98% of all read types. For SVs Princess also reached a high precision (93%, 94%, 86%) and a high sensitivity (77%, 79%, 79%). Both variant types were phased, achieving high N50s of 152Kbp, 117kbp and 17.42Mbp for PacBio CLR, CCS and ONT PromethION, respectively. We are currently evaluating methylation results from ONT. This highlights the versatility and performance of PRINCESS. PRINCESS applied to 18 PacBio with matching RNA-Seq data samples improved the detection of SVs (on average 22,105), SNVs and phasing (~5 Mbp average N50) and thus allowed the detection of eQTL in an automated, fast and comprehensive fashion.

Learning Objectives:

1. Princess: one-stop for all variant detection

2. Comprehensive understanding of variations using long-reads (PacBio and Oxford Nanopore Technologies)

3. Effect of CCS inserts size on Single Nucleotide and Structural Variation detection