The performance of most error-correction (EC) algorithms that operate on genomics reads is dependent on the proper choice of its configuration parameters, such as the value of k in k-mer based techniques. In this work, we target the problem of finding the best values of these configuration parameters to optimize error correction and consequently improve genome assembly. We perform this in an adaptive manner, adapted to different datasets and to EC tools, due to the observation that different configuration parameters are optimal for different datasets, i.e., from different platforms and species, and vary with the EC algorithm being applied. We use language modeling techniques from the Natural Language Processing (NLP) domain in our algorithmic suite, Athena, to automatically tune the performance-sensitive configuration parameters. Through the use of N-Gram and Recurrent Neural Network (RNN) language modeling, we validate the intuition that the EC performance can be computed quantitatively and efficiently using the "perplexity" metric, repurposed from NLP. After training the language model, we show that the perplexity metric calculated from a sample of the test (or production) data has a strong negative correlation with the quality of error correction of erroneous NGS reads. Therefore, we use the perplexity metric to guide a hill climbing-based search, converging toward the best configuration parameter value. Our approach is suitable for both de novo and comparative sequencing (resequencing), eliminating the need for a reference genome to serve as the ground truth. We find that Athena can automatically find the optimal value of k with a very high accuracy for 7 real datasets and using 3 different k-mer based EC algorithms, Lighter, Blue, and Racer. The inverse relation between the perplexity metric and alignment rate exists under all our tested conditions-for real and synthetic datasets, for all kinds of sequencing errors (insertion, deletion, and substitution), and for high and low error rates. The absolute value of that correlation is at least 73%. In our experiments, the best value of k found by Athena achieves an alignment rate within 0.53% of the oracle best value of k found through brute force searching (i.e., scanning through the entire range of k values). Athena's selected value of k lies within the top-3 best k values using N-Gram models and the top-5 best k values using RNN models With best parameter selection by Athena, the assembly quality (NG50) is improved by a Geometric Mean of 4.72X across the 7 real datasets.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6834855 | PMC |
http://dx.doi.org/10.1038/s41598-019-52196-4 | DOI Listing |
Sci Rep
January 2025
Business School, Sichuan University, 610059, Chengdu, China.
The comprehensive benefit evaluation of LID based on multi-criteria decision-making methods faces technical issues such as the uncertainties and vagueness in hybrid information sources, which can affect the overall evaluation results and ranking of alternatives. This study introduces a multi-indicator fuzzy comprehensive benefit evaluation approach for the selection of LID measures, aiming to provide a robust and holistic framework for evaluating their benefits at the community level. The proposed methodology integrates quantitative environmental and economic indicators with qualitative social benefit indicators, combining the use of the Storm Water Management Model (SWMM) and ArcGIS for scenario-based analysis, and the use of hesitant fuzzy language sets and Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) for decision-making.
View Article and Find Full Text PDFAppl Radiat Isot
January 2025
Experimental Nuclear Physics Department, Nuclear Research Centre, Egyptian Atomic Energy Authority, Egypt; Cyclotron Facility, Egyptian Atomic Energy Authority, Egypt.
Neutron and gamma-ray shielding design for a 30Ci (1.11TBq) Am-Be irradiation facility is studied using MCNP5 Monte Carlo simulation code. The study focuses on the optimization of the shielding layers of the previously planned neutron irradiation facility.
View Article and Find Full Text PDFPLoS One
January 2025
Department of Electrical Engineering, Shiraz Branch, Islamic Azad University, Shiraz, Iran.
CNN is considered an efficient tool in brain image segmentation. However, neonatal brain images require specific methods due to their nature and structural differences from adult brain images. Hence, it is necessary to determine the optimal structure and parameters for these models to achieve the desired results.
View Article and Find Full Text PDFMed Biol Eng Comput
January 2025
Department of Industrial Engineering, University of Florence, Via Di Santa Marta 3, 50139, Florence, Italy.
In bone tumor resection surgery, patient-specific cutting guides aid the surgeon in the resection of a precise part of the bone. Despite the use of automation methodologies in surgical guide modeling, to date, the placement of cutting planes is a manual task. This work presents an algorithm for the automatic positioning of cutting planes to reduce healthy bone resected and thus improve post-operative outcomes.
View Article and Find Full Text PDFInterv Neuroradiol
January 2025
Department of Interventional Neuroradiology, Austin Health, Heidelberg, Melbourne, Australia.
Background: Intrasaccular flow diversion using the woven endobridge device (WEB; MicroVention, Aliso Viejo, CA, USA) for the treatment of intracranial aneurysms has demonstrated large scale safety and efficacy. However, limitations arise from its structural configuration, restricting its application to specific aneurysm sizes and shapes.
Technique Overview: We introduce the CUPCAKE technique, a combination of conventional coiling followed by WEB intrasaccular flow disruption in select cases of atypical aneurysms with technically challenging morphology not typically treatable by WEB alone.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!