The Genome in a Bottle Consortium (GIAB) has characterized an Ashkenazi trio from the Personal Genome Project (NIST Reference Material 8392) with 12 short, long, and linked read sequencing and mapping methods. Datasets are public without embargo for analysis and methods development by the community. We have characterized ~3.7 million small variants as well as reference calls for ~90% of the genome with estimated errors of ~2 FPs and 2 FNs per million variants. To extend this characterization to larger indels and structural variants, we collected analyses of variants >=20 bp from 33 bioinformatics methods and five technologies. Nineteen discovery and refinement methods produced sequence-resolved calls using local or global assembly or split reads, giving a precise prediction of deletion breakpoints, inserted sequences, and complex changes. We designed an integration approach to address challenges in comparing and evaluating large variants, which are frequently in tandem repeats (>50% of all calls) and not precisely characterized. We have iteratively refined our integration process based on feedback received for publicly released draft “straw man” callsets. Current work includes developing 2-tiered high-confidence variant calls and a high-confidence bed file for benchmarking SVs, as well as a web app for crowd-sourced manual curation of SVs. These results represent a significant step in GIAB work towards improved benchmarking of large variants in research and clinical settings.
Medical Laboratory Technician67%