Background: Recent advances in long-read sequencing technologies enabled accurate and contiguous assemblies of large genomes and metagenomes. However, even long and accurate high-fidelity (HiFi) reads do not resolve repeats that are longer than the read lengths. This limitation negatively affects the contiguity of diploid genome assemblies since two haplomes share many long identical regions.
View Article and Find Full Text PDFGene knockout studies suggest that ~300 genes in a bacterial genome and ~1,100 genes in a yeast genome cannot be deleted without loss of viability. These single-gene knockout experiments do not account for negative genetic interactions, when two or more genes can each be deleted without effect, but their joint deletion is lethal. Thus, large-scale single-gene deletion studies underestimate the size of a minimal gene set compatible with cell survival.
View Article and Find Full Text PDFEven though the recent advances in 'complete genomics' revealed the previously inaccessible genomic regions, analysis of variations in centromeres and other extra-long tandem repeats (ETRs) faces an algorithmic challenge since there are currently no tools for accurate sequence comparison of ETRs. Counterintuitively, the classical alignment approaches, such as the Smith-Waterman algorithm, fail to construct biologically adequate alignments of ETRs. We present UniAligner-the parameter-free sequence alignment algorithm with sequence-dependent alignment scoring that automatically changes for any pair of compared sequences.
View Article and Find Full Text PDFThe advent of long and accurate "HiFi" reads has greatly improved our ability to generate complete metagenome-assembled genomes (MAGs), enabling "complete metagenomics" studies that were nearly impossible to conduct with short reads. In particular, HiFi reads simplify the identification and phasing of mutations in MAGs: It is increasingly feasible to distinguish between positions that are prone to mutations and positions that rarely ever mutate, and to identify co-occurring groups of mutations. However, the problems of identifying rare mutations in MAGs, estimating the false-discovery rate (FDR) of these identifications, and phasing identified mutations remain open in the context of HiFi data.
View Article and Find Full Text PDF