Publications by authors named "T Pupko"

Article Synopsis
  • The study explores a novel method for multiple sequence alignments in bioinformatics using natural language processing (NLP) techniques.
  • Researchers developed BetaAlign, a deep learning aligner that outperforms traditional alignment algorithms and offers highly accurate results by leveraging transformer models.
  • The findings highlight the potential of AI-based approaches to improve alignment tasks and advance phylogenomics, with training data and tools made available through Hugging Face.
View Article and Find Full Text PDF

Introns are highly prevalent in most eukaryotic genomes. Despite the accumulating evidence for benefits conferred by the possession of introns, their specific roles and functions, as well as the processes shaping their evolution, are still only partially understood. Here, we explore the evolution of the eukaryotic intron-exon gene structure by focusing on several key features such as the intron length, the number of introns, and the intron-to-exon length ratio in protein-coding genes.

View Article and Find Full Text PDF

Insertions and deletions constitute the second most important source of natural genomic variation. Insertions and deletions make up to 25% of genomic variants in humans and are involved in complex evolutionary processes including genomic rearrangements, adaptation, and speciation. Recent advances in long-read sequencing technologies allow detailed inference of insertions and deletion variation in species and populations.

View Article and Find Full Text PDF

Sequencing the mitochondrial genome of the tunicate Oikopleura dioica is a challenging task due to the presence of long poly-A/T homopolymer stretches, which impair sequencing and assembly. Here, we report on the sequencing and annotation of the majority of the mitochondrial genome of O. dioica by means of combining several DNA and amplicon reads obtained by Illumina and MinIon Oxford Nanopore Technologies with public RNA sequences.

View Article and Find Full Text PDF

Motivation: Currently used methods for estimating branch support in phylogenetic analyses often rely on the classic Felsenstein's bootstrap, parametric tests, or their approximations. As these branch support scores are widely used in phylogenetic analyses, having accurate, fast, and interpretable scores is of high importance.

Results: Here, we employed a data-driven approach to estimate branch support values with a probabilistic interpretation.

View Article and Find Full Text PDF