Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. While there have been many efforts to develop computationally efficient graph-based toolkits for NGS read alignment and variant calling, methods to curate genomic variants and subsequently construct genome graphs remain an understudied problem that inevitably determines the effectiveness of the overall bioinformatics pipeline. In this study, we discuss obstacles encountered during graph construction and propose methods for sample selection based on population diversity, graph augmentation with structural variants and resolution of graph reference ambiguity caused by information overload.
View Article and Find Full Text PDFThe precisionFDA Truth Challenge V2 aimed to assess the state of the art of variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 variant call sets for one or more sequencing technologies (Illumina, PacBio HiFi, and Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with updated Genome in a Bottle benchmark sets and genome stratifications.
View Article and Find Full Text PDFThe analysis of equine electrocardiographic (ECG) recordings is complicated by the absence of agreed abnormality classification criteria. We explore the applicability of several complexity analysis methods for characterization of non-linear aspects of electrocardiographic recordings. We here show that complexity estimates provided by Lempel-Ziv '76, Titchener's T-complexity and Lempel-Ziv '78 analysis of ECG recordings of healthy Thoroughbred horses are highly dependent on the duration of analysed ECG fragments and the heart rate.
View Article and Find Full Text PDFThe human reference genome serves as the foundation for genomics by providing a scaffold for alignment of sequencing reads, but currently only reflects a single consensus haplotype, thus impairing analysis accuracy. Here we present a graph reference genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.
View Article and Find Full Text PDF