Recent improvements in sequencing chemistry and instrument performance combine to create a new PacBio data type of highly accurate (HiFi), long insert reads. Increased read length and improvement in library construction enables average read lengths of 10-20 kb with average sequence identity greater than 99% from raw single molecule reads. The resulting reads have the accuracy comparable to short read NGS but with 50-100 times longer read length. These highly accurate long reads allow for comprehensive variation detection from single nucleotide polymorphism to large structural variation with a single data type at 15- to 30-fold coverage. Using existing variation detection pipelines (e.g. GATK) to call variants and construct phase blocks, we achieve state of the art sensitivity and specificity for small nucleotide polymorphisms while preserving high sensitivity to detect larger structural variation (>50 bp) at single base resolution and delineate haplotype linkages. Additionally, the lack of sequence context bias and the unambiguous mappability of the longer HiFi reads allow a more complete survey of the human genome, expanding the detection of variants outside of the GIAB high confidence regions. We demonstrate the utility of this data type by sequencing to 15- to 30-fold coverage and calling all variants in the well-characterized HG002 genome. In addition to human resequencing analysis, HiFi reads may be used to assemble and call variants in plant or animal genomes, with assembly results rivaling the current long read sequencing approaches. The highly accurate raw data is directly compatible with many existing bioinformatics tools.
1. Understand the generation of HiFi sequence reads, and the uniqueness of this data type among sequencing technologies
2. Learn about the different use cases uniquely enabled by the new HiFi sequence data type