We are entering into an exciting era of genomics where truly complete, high-quality assemblies of human chromosomes are available end-to-end, or from ‘telomere-to-telomere’ (T2T). Recently, the Telomere-to-Telomere (T2T) consortium announced our v1.0 assembly that includes more than 150 Mbp of novel sequence compared to GRCh38, achieves near-perfect sequence accuracy, and unlocks the most complex regions of the genome to functional study. This technological advance, crediting the confluence of new assembly methods with long read sequencing technologies, offers a new opportunity to comprehensively the genomic structure and epigenetic organization in the most repeat-dense regions of our chromosomes. In particular, I will focus on the release of initial genetic and epigenetic reference of all human centromeric regions. High-resolution study of the pericentromeric sequence content and organization reveals new satellite families, sites of transposable element insertion, segmental duplications, and pericentromeric gene predictions. Using unique markers (marker-assisted method) to anchor ultra-long nanopore reads to human centromeric regions regions we report hypomethylated dips at every centromeric region, as previously described for the T2TX centromere. These sites are shown to coincide with regions enriched in centromere protein A (CENP-A) and may provide a signature of sites of kinetochore assembly genome-wide.
1. Understand the incomplete nature of the human reference genome, which sequences are missing and how that impacts our understanding for basic/translational research
2. Explain the differences in long read technologies in repeat assembly
3. Describe sequences and epigenetic patterns that are in the part of our genome that are not represented in current reference maps (e.g. GRCh38).