SPAdes-Directed Genome Assembly Using Short Reads and High-Definition Maps
The increased use of next generation sequencing has led to the generation of many genome assemblies; however, they are often incomplete and not contiguous. While long-read technologies have improved assembly contiguity, they are more expensive and still lack sufficient read length to resolve complex structures. Optical mapping technologies have been used to scaffold and improve assemblies, but the inherent limitation on resolution and precision limits utility, especially in combination with short-read data. To provide the necessary long-range information while maintaining sufficient resolution to complement next generation sequencing technologies, Nabsys has developed its HD-Mapping™ platform to construct electronic whole genome maps. By analyzing reads that are hundreds of kilobases in length, electronic detection preserves long-range information while simultaneously achieving better than 300 bp single molecule resolution.
Consensus maps or de novo map assemblies, constructed from Nabsys data, accurately represent the distances between tags affixed to single molecules. For a well-characterized microbial strain, we compare de novo assembly consensus interval sizes to interval sizes determined from the reference sequence, showing R-squared of better than 0.9999 (interval size ≥300 bp). Due to the stochastic nature of single-molecule false negatives and false positives, the consensus contains 0 false positives and 0 false negatives, for intervals >500 bp.
Nabsys electronic single-molecule reads have been integrated with short-read data in a hybrid assembler to improve assembly completeness. We will describe results from hybrid assembly of Nabsys HD maps and Illumina short-read sequence data using the SPAdes assembler. We demonstrate that hybrid assembly of short-reads and Nabsys electronic maps significantly improves the contiguity and quality of genome assemblies.