Improving Genomic Analysis by Fixing Reference Errors

The current version of the human reference genome, GRCh38, contains a number of errors including 1.2 Mbp of falsely duplicated and 8.04 Mbp of collapsed regions. These errors impact the variant calling of 33 protein-coding genes, including 12 with medical relevance. We present a modified GRCh38 reference that corrects errors while maintaining the same coordinates allowing us to leverage the extensive existing annotations of GRCh38, along with an efficient remapping approach, FixItFelix, that enables quick and efficient re-analysis to gain improved insights from existing data while maintaining the same reference coordinates. We showcase these improvements over multi-ethnic control samples across short and long-read DNA-, and RNA sequencing , demonstrating improvements for population variant calling as well as eQTL studies.

Learning objectives

1. Discuss what are some Issues are with GRCh38 human reference genome.

2. Explain how the errors in reference can be corrected and the impact of those errors can be fixed.

3. Discuss a better remapping approach for re-analysis of using the fixed reference.