Making the Most of Your NGS Data: Understanding Metrics for Target-enriched NGS

Introduction

Targeted next-generation sequencing (NGS) is often performed using hybridization-based target enrichment, which deploys oligonucleotide probes to capture regions of interest for downstream sequencing. Although targeted sequencing reduces sequencing expense, it is still time-consuming and expensive, so an understanding of key sequencing metrics can help you to maximize the value of each run.

Beyond common metrics (e.g., base quality, cluster density, number of reads passing filter), several additional metrics provide more in-depth insights into the success of a sequencing run: Depth of coverage (the number of times that a particular base within the target region is represented in the sequence data) and on-target rate (the number of bases that map to the target region) are fairly intuitive concepts. Also intuitive is the duplication rate for a sequencing run, which reflects the percentage of duplicate reads (reads that are mapped to the exact same location, including the coordinates of the 3’ and 5’ ends) out of the total mapped reads. This article focuses on two less-intuitive metrics: GC-bias and Fold-80 penalty, and offers some tips on how to improve them.

GC bias

The distribution of AT-rich and GC-rich regions—often referred to as GC content—is uneven across genomes. During sequencing, regions of high or low GC content are often unevenly sequenced, causing disproportionate coverage of these regions; this is known as GC bias. GC bias in sequencing data across regions of variable GC content can be visualized in GC-bias distribution plots (Figure 1).

High levels of GC bias can be introduced during library preparation (especially in workflows dependent on PCR), during hybrid capture, or during the sequencing run itself. This bias increases the amount of sequencing that must be performed, driving up expense; thus, it is important to choose a library preparation kit that minimizes GC bias.

Fold-80 Base Penalty

Analysis of sequencing data typically reveals that some target regions have achieved higher coverage than others. The Fold-80 base penalty metric is one way to assess coverage uniformity. Once the mean target coverage is determined for an experiment, the Fold-80 base penalty describes how much more sequencing is required to bring 80% of the target bases to that mean coverage. Thus, a run with perfect coverage uniformity would have a Fold-80 base penalty score of 1, indicating an on-target rate of 100% and uniform coverage (see Figure 2). Values > 1 reflect uneven levels of uniformity. For instance, a Fold-80 value of 2 means that twice as much (2-fold) sequencing is required for 80% of the reads to reach the mean coverage.

The Fold-80 base penalty provides information about the capture efficiency of the probes in the panel, which is impacted by both probe design and probe quality. To decrease the Fold-80 base penalty and reduce the need for additional, costly sequencing runs, use high-quality, well-designed probes.

Understanding sequencing metrics can help you to get the most out of valuable sequencing resources, including time, money, and precious samples. To watch short videos about the five metrics mentioned here, and to learn about other aspects of NGS, visit: https://go.roche.com/Targeted-NGS-Metrics