Using the longest run subsequence problem within homology-based scaffolding.

Sven Schrinner Manish Goel Michael Wulfert Philipp Spohr Korbinian Schneeberger Gunnar W Klau

Algorithms Mol Biol

Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.

Published: June 2021

Genome assembly is a key challenge in computational genomics that involves linking smaller DNA sequences (contigs) to form larger structures (pseudo-chromosomes) using related species' incomplete assemblies.
Researchers propose addressing a specific issue in homology-based scaffolding by using alignments of segments within contigs to find the most similar segments in another assembly, which is formulated as the longest run subsequence (LRS) problem.
The study shows that LRS is NP-hard, provides solution strategies, and successfully applies these approaches to efficiently solve cases from Arabidopsis thaliana assemblies, with all data and source code made publicly available.

Genome assembly is one of the most important problems in computational genomics. Here, we suggest addressing an issue that arises in homology-based scaffolding, that is, when linking and ordering contigs to obtain larger pseudo-chromosomes by means of a second incomplete assembly of a related species. The idea is to use alignments of binned regions in one contig to find the most homologous contig in the other assembly. We show that ordering the contigs of the other assembly can be expressed by a new string problem, the longest run subsequence problem (LRS). We show that LRS is NP-hard and present reduction rules and two algorithmic approaches that, together, are able to solve large instances of LRS to provable optimality. All data used in the experiments as well as our source code are freely available. We demonstrate its usefulness within an existing larger scaffolding approach by solving realistic instances resulting from partial Arabidopsis thaliana assemblies in short computation time.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8240273	PMC
http://dx.doi.org/10.1186/s13015-021-00191-8	DOI Listing

Publication Analysis

Top Keywords

longest subsequence

subsequence problem

homology-based scaffolding

ordering contigs

problem homology-based

scaffolding genome

assembly

genome assembly

assembly problems

problems computational

Similar Publications

Slit Lamp Report Generation and Question Answering: Development and Validation of a Multimodal Transformer Model with Large Language Model Integration.

J Med Internet Res

December 2024

Guangzhou Cadre and Talent Health Management Center, Guangzhou, China.

Ziwei Zhao Weiyi Zhang Xiaolan Chen Fan Song James Gunasegaram

Background: Large language models have shown remarkable efficacy in various medical research and clinical applications. However, their skills in medical image recognition and subsequent report generation or question answering (QA) remain limited.

Objective: We aim to finetune a multimodal, transformer-based model for generating medical reports from slit lamp images and develop a QA system using Llama2.

View Article and Find Full Text PDF

Similar Publications

dwMLCS: An Efficient MLCS Algorithm based on Dynamic and Weighted Directed Acyclic Graph.

IEEE/ACM Trans Comput Biol Bioinform

July 2024

Changyong Yu Dekuan Gao Xu Guo Haitao Ma Yuhai Zhao

The problem of finding the longest common subsequence (MLCS) for multiple sequences is a computationally intensive and challenging problem that has significant applications in various fields such as text comparison, pattern recognition, and gene diagnosis. Currently, the dominant point-based MLCS algorithms have become popular and extensively studied. Generally, they construct the directed acyclic graph (DAG) of matching points and convert the MLCS problem into a search for the longest paths in the DAG.

View Article and Find Full Text PDF

Similar Publications

Identification of pattern mining algorithm for rugby league players positional groups separation based on movement patterns.

PLoS One

May 2024

Carnegie School of Sport, Leeds Beckett University, Leeds, United Kingdom.

Victor Elijah Adeyemo Anna Palczewska Ben Jones Dan Weaving

The application of pattern mining algorithms to extract movement patterns from sports big data can improve training specificity by facilitating a more granular evaluation of movement. Since movement patterns can only occur as consecutive, non-consecutive, or non-sequential, this study aimed to identify the best set of movement patterns for player movement profiling in professional rugby league and quantify the similarity among distinct movement patterns. Three pattern mining algorithms (l-length Closed Contiguous [LCCspm], Longest Common Subsequence [LCS] and AprioriClose) were used to extract patterns to profile elite rugby football league hookers (n = 22 players) and wingers (n = 28 players) match-games movements across 319 matches.

View Article and Find Full Text PDF

Similar Publications

Using Sequence Analyses to Quantitatively Measure Oropharyngeal Swallowing Temporality in Point-of-Care Ultrasound Examinations: A Pilot Study.

J Clin Med

April 2024

Research Institute for Smart Ageing, The Hong Kong Polytechnic University, Hong Kong SAR, China.

Wilson Yiu Shun Lam Elaine Kwong Huberta Wai Tung Chan Yong-Ping Zheng

(1) : Swallowing is a complex process that comprises well-timed control of oropharyngeal and laryngeal structures to achieve airway protection and swallowing efficiency. To understand its temporality, previous research adopted adherence measures and revealed obligatory pairs in healthy swallows and the effect of aging and bolus type on the variability of event timing and order. This study aimed to (i) propose a systemic conceptualization of swallowing physiology, (ii) apply sequence analyses, a set of information-theoretic and bioinformatic methods, to quantify and characterize swallowing temporality, and (iii) investigate the effect of aging and dysphagia on the quantified variables using sequence analyses measures.

View Article and Find Full Text PDF

Similar Publications

ICGA-GPT: report generation and question answering for indocyanine green angiography images.

Br J Ophthalmol

September 2024

School of Optometry, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China.

Xiaolan Chen Weiyi Zhang Ziwei Zhao Pusheng Xu Yingfeng Zheng

Background: Indocyanine green angiography (ICGA) is vital for diagnosing chorioretinal diseases, but its interpretation and patient communication require extensive expertise and time-consuming efforts. We aim to develop a bilingual ICGA report generation and question-answering (QA) system.

Methods: Our dataset comprised 213 129 ICGA images from 2919 participants.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!