AI Article Synopsis

  • Genome assembly is a key challenge in computational genomics that involves linking smaller DNA sequences (contigs) to form larger structures (pseudo-chromosomes) using related species' incomplete assemblies.
  • Researchers propose addressing a specific issue in homology-based scaffolding by using alignments of segments within contigs to find the most similar segments in another assembly, which is formulated as the longest run subsequence (LRS) problem.
  • The study shows that LRS is NP-hard, provides solution strategies, and successfully applies these approaches to efficiently solve cases from Arabidopsis thaliana assemblies, with all data and source code made publicly available.

Article Abstract

Genome assembly is one of the most important problems in computational genomics. Here, we suggest addressing an issue that arises in homology-based scaffolding, that is, when linking and ordering contigs to obtain larger pseudo-chromosomes by means of a second incomplete assembly of a related species. The idea is to use alignments of binned regions in one contig to find the most homologous contig in the other assembly. We show that ordering the contigs of the other assembly can be expressed by a new string problem, the longest run subsequence problem (LRS). We show that LRS is NP-hard and present reduction rules and two algorithmic approaches that, together, are able to solve large instances of LRS to provable optimality. All data used in the experiments as well as our source code are freely available. We demonstrate its usefulness within an existing larger scaffolding approach by solving realistic instances resulting from partial Arabidopsis thaliana assemblies in short computation time.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8240273PMC
http://dx.doi.org/10.1186/s13015-021-00191-8DOI Listing

Publication Analysis

Top Keywords

longest subsequence
8
subsequence problem
8
homology-based scaffolding
8
ordering contigs
8
problem homology-based
4
scaffolding genome
4
assembly
4
genome assembly
4
assembly problems
4
problems computational
4

Similar Publications

Background: Large language models have shown remarkable efficacy in various medical research and clinical applications. However, their skills in medical image recognition and subsequent report generation or question answering (QA) remain limited.

Objective: We aim to finetune a multimodal, transformer-based model for generating medical reports from slit lamp images and develop a QA system using Llama2.

View Article and Find Full Text PDF

The problem of finding the longest common subsequence (MLCS) for multiple sequences is a computationally intensive and challenging problem that has significant applications in various fields such as text comparison, pattern recognition, and gene diagnosis. Currently, the dominant point-based MLCS algorithms have become popular and extensively studied. Generally, they construct the directed acyclic graph (DAG) of matching points and convert the MLCS problem into a search for the longest paths in the DAG.

View Article and Find Full Text PDF

The application of pattern mining algorithms to extract movement patterns from sports big data can improve training specificity by facilitating a more granular evaluation of movement. Since movement patterns can only occur as consecutive, non-consecutive, or non-sequential, this study aimed to identify the best set of movement patterns for player movement profiling in professional rugby league and quantify the similarity among distinct movement patterns. Three pattern mining algorithms (l-length Closed Contiguous [LCCspm], Longest Common Subsequence [LCS] and AprioriClose) were used to extract patterns to profile elite rugby football league hookers (n = 22 players) and wingers (n = 28 players) match-games movements across 319 matches.

View Article and Find Full Text PDF

(1) : Swallowing is a complex process that comprises well-timed control of oropharyngeal and laryngeal structures to achieve airway protection and swallowing efficiency. To understand its temporality, previous research adopted adherence measures and revealed obligatory pairs in healthy swallows and the effect of aging and bolus type on the variability of event timing and order. This study aimed to (i) propose a systemic conceptualization of swallowing physiology, (ii) apply sequence analyses, a set of information-theoretic and bioinformatic methods, to quantify and characterize swallowing temporality, and (iii) investigate the effect of aging and dysphagia on the quantified variables using sequence analyses measures.

View Article and Find Full Text PDF

Background: Indocyanine green angiography (ICGA) is vital for diagnosing chorioretinal diseases, but its interpretation and patient communication require extensive expertise and time-consuming efforts. We aim to develop a bilingual ICGA report generation and question-answering (QA) system.

Methods: Our dataset comprised 213 129 ICGA images from 2919 participants.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!