MAY 09, 2019 9:00 AM PDT

Completing the Human Genome: The Progress and Challenge of Satellite DNA Assembly



Release of the first human genome assembly was a landmark achievement, and after nearly two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no one chromosome has yet been finished end to end, and hundreds of gaps persist across the genome. This is a fundamental problem because these gaps vary in repeat structure and copy number between individuals, which can affect genome stability and health. 

To address this challenge, I will present a whole-genome de novo assembly that surpasses the continuity of GRCh38, along with the first complete, telomere-to-telomere assembly of a human X chromosome. We have collected 40X coverage of ultra-long Oxford Nanopore sequencing for the CHM13hTERT cell line, including 44 Gb of sequence in reads >100 kb and a maximum read length exceeding 1 Mb.  This unprecedented coverage of ultra-long reads enabled the resolution of most repeats in the genome, including large fractions of the centromeric satellite arrays and short arms of the acrocentrics. Using this assembly as a basis, we chose to manually finish the X chromosome. These results demonstrate that it is now possible to finish entire human chromosomes without gaps, and our future work (Telomere-to-telomere, T2T Consortium) will focus on completing and validating the remainder of the genome.

Finally, centromeric sequences are expected to vary in repeat composition and copy number between individuals in the population. To study the extent of this variation, I have performed a comprehensive study of centromere sequence structural variation using a panel of high-coverage, long read datasets from individuals from diverse populations. Efforts to increase production of UL-read sequencing – thereby dramatically increasing our ability to characterize satellite array structure – using the PromethION sequencing platform from Oxford Nanopore will be discussed.

Learning Objectives: 

1. Human centromere sequence structure and organization.
2. Long-read sequencing and scaffolding assembly strategies to complete human chromosome assemblies.

You May Also Like
MAY 09, 2019 9:00 AM PDT

Completing the Human Genome: The Progress and Challenge of Satellite DNA Assembly

Loading Comments...
  • See More