"Machine Learning Applied to Single-Molecule Electronic DNA Mapping for Structural Variant Verification in Human Genomes" - ASHG 2017

November 17, 2017

The importance of structural variation in human disease and the difficulty of detecting structural variants larger than 50 base pairs has led to the development of several long-read sequencing technologies and optical mapping platforms. Frequently, multiple technologies and ad hoc methods are required to obtain a consensus regarding the location, size and nature of a structural variant, with no single approach able to reliably bridge the gap of variant sizes between those readily detected using NGS technologies and the largest rearrangements observed with optical mapping. Often, structural variants larger than 10 kilobases are not detected.

To address this unmet need, we have developed a new software package, SV-VerifyTM, which utilizes data collected with the Nabsys High Definition Mapping (HD-MappingTM) system, to perform hypothesis-based verification of putative deletions. We demonstrate that whole genome maps, constructed from data generated by electronic detection of tagged DNA, hundreds of kilobases in length, can be used effectively to facilitate calling of structural variants ranging in size from 300 base pairs to hundreds of kilobase pairs. SV-Verify implements hypothesis-based verification of putative structural variants using supervised machine learning. Machine learning is realized using a set of support vector machines, capable of concurrently testing several thousand independent hypotheses. We describe support vector machine training, utilizing 1089 deletions and 4637 negative controls from a well-characterized human genome. Plots delineating the specificity versus sensitivity of each of the support vector machines will be presented. We subsequently applied the trained classifiers to another human genome, evaluating > 5000 putative deletions, demonstrating high sensitivity and specificity for deletions from 300 base pairs to hundreds of kilobases. Over 78% of deletions called by three or more technologies were confirmed by SV-Verify.

Download Poster