Human chromosome 19q13.4 contains genes encoding killer-cell immunoglobulin-like receptors (KIR). The region has certain properties such as single nucleotide variation, structural variation, homology, and repetitive elements that make it hard to align accurately beyond single gene alleles. The 68 reference haplotypes in the human genome reference range in length from 67 to 269 kilobases and contain 4 to 18 genes. We leveraged these references and tools from our long-read KIR haplotype assembly algorithm to define and align KIR haplotypes at <5 kb resolution on average. We then used a standard alignment algorithm to refine that alignment down to single base resolution. This processing demonstrated that the high-level alignment recapitulates human-curated annotation of the human haplotypes as well as a chimpanzee haplotype. Further, assignments and alignments of gene alleles were consistent with their human curation in haplotype and allele databases. These results define KIR haplotypes as 14 loci containing 9 genes. The multiple sequence alignments have been applied in two computational workflows as in silico gene markers and in vitro capture probes. The first workflow is an efficient computational approach for in silico KIR probe interpretation (KPI) to accurately interpret individual’s KIR genes and haplotype-pairs from KIR sequencing reads. The second workflow efficiently captures, sequences, and assembles diploid human KIR haplotypes from PacBio HiFi reads.
1. Identify the types of variation in the KIR region
2. Explain how this variation affects sequence interpretation
3. Define the difference in the output between the two WGS interpretation algorithms