Affordable genotyping methods are essential in genomics. Commonly used genotyping methods primarily support single nucleotide variants and short indels but neglect structural variants. Additionally, accuracy of read alignments to a reference genome is unreliable in highly polymorphic and repetitive regions, further impacting genotyping performance.
View Article and Find Full Text PDFModern pangenome graphs are built using haplotype-resolved genome assemblies. When mapping reads to a pangenome graph, prioritizing alignments that are consistent with the known haplotypes improves genotyping accuracy. However, the existing rigorous formulations for colinear chaining and alignment problems do not consider the haplotype paths in a pangenome graph.
View Article and Find Full Text PDFModern genomic datasets, like those generated under the 1000 Genome Project, contain millions of variants belonging to known haplotypes. Although these datasets are more representative than a single reference sequence and can alleviate issues like reference bias, they are significantly more computationally burdensome to work with, often involving large-indexed genome graph data structures for tasks such as read mapping. The construction, preprocessing, and mapping algorithms can require substantial computational resources depending on the size of these variant sets.
View Article and Find Full Text PDFA major challenge for density functional theory (DFT) is its failure to treat static correlation, yielding errors in predicted charges, band gaps, van der Waals forces, and reaction barriers. Here we combine one- and two-electron reduced density matrix (1- and 2-RDM) theories with DFT to obtain a universal O(N^{3}) generalization of DFT for static correlation. Using the lowest unitary invariant of the cumulant 2-RDM, we generate a 1-RDM functional theory that corrects the convexity of any DFT functional to capture static correlation in its fractional orbital occupations.
View Article and Find Full Text PDFThe problem of aligning a sequence to a walk in a labeled graph is of fundamental importance to Computational Biology. For an arbitrary graph and a pattern of length , a lower bound based on the Strong Exponential Time Hypothesis implies that an algorithm for finding a walk in exactly matching significantly faster than time is unlikely. However, for many special graphs, such as de Bruijn graphs, the problem can be solved in linear time.
View Article and Find Full Text PDF