Genome assembly is one of the most important problems in computational genomics. Here, we suggest addressing an issue that arises in homology-based scaffolding, that is, when linking and ordering contigs to obtain larger pseudo-chromosomes by means of a second incomplete assembly of a related species. The idea is to use alignments of binned regions in one contig to find the most homologous contig in the other assembly. We show that ordering the contigs of the other assembly can be expressed by a new string problem, the longest run subsequence problem (LRS). We show that LRS is NP-hard and present reduction rules and two algorithmic approaches that, together, are able to solve large instances of LRS to provable optimality. All data used in the experiments as well as our source code are freely available. We demonstrate its usefulness within an existing larger scaffolding approach by solving realistic instances resulting from partial Arabidopsis thaliana assemblies in short computation time.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8240273 | PMC |
http://dx.doi.org/10.1186/s13015-021-00191-8 | DOI Listing |
J Med Internet Res
December 2024
Guangzhou Cadre and Talent Health Management Center, Guangzhou, China.
Background: Large language models have shown remarkable efficacy in various medical research and clinical applications. However, their skills in medical image recognition and subsequent report generation or question answering (QA) remain limited.
Objective: We aim to finetune a multimodal, transformer-based model for generating medical reports from slit lamp images and develop a QA system using Llama2.
IEEE/ACM Trans Comput Biol Bioinform
July 2024
The problem of finding the longest common subsequence (MLCS) for multiple sequences is a computationally intensive and challenging problem that has significant applications in various fields such as text comparison, pattern recognition, and gene diagnosis. Currently, the dominant point-based MLCS algorithms have become popular and extensively studied. Generally, they construct the directed acyclic graph (DAG) of matching points and convert the MLCS problem into a search for the longest paths in the DAG.
View Article and Find Full Text PDFPLoS One
May 2024
Carnegie School of Sport, Leeds Beckett University, Leeds, United Kingdom.
The application of pattern mining algorithms to extract movement patterns from sports big data can improve training specificity by facilitating a more granular evaluation of movement. Since movement patterns can only occur as consecutive, non-consecutive, or non-sequential, this study aimed to identify the best set of movement patterns for player movement profiling in professional rugby league and quantify the similarity among distinct movement patterns. Three pattern mining algorithms (l-length Closed Contiguous [LCCspm], Longest Common Subsequence [LCS] and AprioriClose) were used to extract patterns to profile elite rugby football league hookers (n = 22 players) and wingers (n = 28 players) match-games movements across 319 matches.
View Article and Find Full Text PDFJ Clin Med
April 2024
Research Institute for Smart Ageing, The Hong Kong Polytechnic University, Hong Kong SAR, China.
(1) : Swallowing is a complex process that comprises well-timed control of oropharyngeal and laryngeal structures to achieve airway protection and swallowing efficiency. To understand its temporality, previous research adopted adherence measures and revealed obligatory pairs in healthy swallows and the effect of aging and bolus type on the variability of event timing and order. This study aimed to (i) propose a systemic conceptualization of swallowing physiology, (ii) apply sequence analyses, a set of information-theoretic and bioinformatic methods, to quantify and characterize swallowing temporality, and (iii) investigate the effect of aging and dysphagia on the quantified variables using sequence analyses measures.
View Article and Find Full Text PDFBr J Ophthalmol
September 2024
School of Optometry, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China.
Background: Indocyanine green angiography (ICGA) is vital for diagnosing chorioretinal diseases, but its interpretation and patient communication require extensive expertise and time-consuming efforts. We aim to develop a bilingual ICGA report generation and question-answering (QA) system.
Methods: Our dataset comprised 213 129 ICGA images from 2919 participants.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!