Motivation: Evolutionary histories can change from one part of the genome to another. The potential for discordance between the gene trees has motivated the development of summary methods that reconstruct a species tree from an input collection of gene trees. ASTRAL is a widely used summary method and has been able to scale to relatively large datasets. However, the size of genomic datasets is quickly growing. Despite its relative efficiency, the current single-threaded implementation of ASTRAL is falling behind the data growth trends is not able to analyze the largest available datasets in a reasonable time.
Results: ASTRAL uses dynamic programing and is not trivially parallel. In this paper, we introduce ASTRAL-MP, the first version of ASTRAL that can exploit parallelism and also uses randomization techniques to speed up some of its steps. Importantly, ASTRAL-MP can take advantage of not just multiple CPU cores but also one or several graphics processing units (GPUs). The ASTRAL-MP code scales very well with increasing CPU cores, and its GPU version, implemented in OpenCL, can have up to 158× speedups compared to ASTRAL-III. Using GPUs and multiple cores, ASTRAL-MP is able to analyze datasets with 10 000 species or datasets with more than 100 000 genes in <2 days.
Availability And Implementation: ASTRAL-MP is available at https://github.com/smirarab/ASTRAL/tree/MP.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/bioinformatics/btz211 | DOI Listing |
Vis Intell
December 2024
Department of Information Technology and Electrical Engineering, ETH Zurich, Sternwartstrasse 7, Zürich, Switzerland.
The LLaMA family, a collection of foundation language models ranging from 7B to 65B parameters, has become one of the most powerful open-source large language models (LLMs) and the popular LLM backbone of multi-modal large language models (MLLMs), widely used in computer vision and natural language understanding tasks. In particular, LLaMA3 models have recently been released and have achieved impressive performance in various domains with super-large scale pre-training on over 15T tokens of data. Given the wide application of low-bit quantization for LLMs in resource-constrained scenarios, we explore LLaMA3's capabilities when quantized to low bit-width.
View Article and Find Full Text PDFJ R Stat Soc Ser C Appl Stat
January 2025
Department of Biostatistics and Health Data Science, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA.
The aim of dynamic prediction is to provide individualized risk predictions over time, which are updated as new data become available. In pursuit of constructing a dynamic prediction model for a progressive eye disorder, age-related macular degeneration (AMD), we propose a time-dependent Cox survival neural network (tdCoxSNN) to predict its progression using longitudinal fundus images. tdCoxSNN builds upon the time-dependent Cox model by utilizing a neural network to capture the nonlinear effect of time-dependent covariates on the survival outcome.
View Article and Find Full Text PDFBMC Cancer
January 2025
Department of Data Science, Faculty of Interdisciplinary Science and Technology, Tarbiat Modares University, Tehran, Iran.
Background: Melanoma is a highly aggressive skin cancer, where early and accurate diagnosis is crucial to improve patient outcomes. Dermoscopy, a non-invasive imaging technique, aids in melanoma detection but can be limited by subjective interpretation. Recently, machine learning and deep learning techniques have shown promise in enhancing diagnostic precision by automating the analysis of dermoscopy images.
View Article and Find Full Text PDFNat Food
January 2025
School of Biological Sciences, University of Aberdeen, Aberdeen, UK.
Nutritional epidemiology aims to link dietary exposures to chronic disease, but the instruments for evaluating dietary intake are inaccurate. One way to identify unreliable data and the sources of errors is to compare estimated intakes with the total energy expenditure (TEE). In this study, we used the International Atomic Energy Agency Doubly Labeled Water Database to derive a predictive equation for TEE using 6,497 measures of TEE in individuals aged 4 to 96 years.
View Article and Find Full Text PDFPLoS One
January 2025
Woodwell Climate Research Center, Falmouth, MA, United States of America.
Soil spectroscopy is a widely used method for estimating soil properties that are important to environmental and agricultural monitoring. However, a bottleneck to its more widespread adoption is the need for establishing large reference datasets for training machine learning (ML) models, which are called soil spectral libraries (SSLs). Similarly, the prediction capacity of new samples is also subject to the number and diversity of soil types and conditions represented in the SSLs.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!