ASTRAL-MP: scaling ASTRAL to very large datasets using randomization and parallelization.

Bioinformatics

Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA, USA.

Published: October 2019

Motivation: Evolutionary histories can change from one part of the genome to another. The potential for discordance between the gene trees has motivated the development of summary methods that reconstruct a species tree from an input collection of gene trees. ASTRAL is a widely used summary method and has been able to scale to relatively large datasets. However, the size of genomic datasets is quickly growing. Despite its relative efficiency, the current single-threaded implementation of ASTRAL is falling behind the data growth trends is not able to analyze the largest available datasets in a reasonable time.

Results: ASTRAL uses dynamic programing and is not trivially parallel. In this paper, we introduce ASTRAL-MP, the first version of ASTRAL that can exploit parallelism and also uses randomization techniques to speed up some of its steps. Importantly, ASTRAL-MP can take advantage of not just multiple CPU cores but also one or several graphics processing units (GPUs). The ASTRAL-MP code scales very well with increasing CPU cores, and its GPU version, implemented in OpenCL, can have up to 158× speedups compared to ASTRAL-III. Using GPUs and multiple cores, ASTRAL-MP is able to analyze datasets with 10 000 species or datasets with more than 100 000 genes in <2 days.

Availability And Implementation: ASTRAL-MP is available at https://github.com/smirarab/ASTRAL/tree/MP.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF	Source
http://dx.doi.org/10.1093/bioinformatics/btz211	DOI Listing

Publication Analysis

Top Keywords

large datasets

gene trees

cpu cores

datasets

astral-mp

astral

astral-mp scaling

scaling astral

astral large

datasets randomization

Similar Publications

An empirical study of LLaMA3 quantization: from LLMs to MLLMs.

Vis Intell

December 2024

Department of Information Technology and Electrical Engineering, ETH Zurich, Sternwartstrasse 7, Zürich, Switzerland.

Wei Huang Xingyu Zheng Xudong Ma Haotong Qin Chengtao Lv

The LLaMA family, a collection of foundation language models ranging from 7B to 65B parameters, has become one of the most powerful open-source large language models (LLMs) and the popular LLM backbone of multi-modal large language models (MLLMs), widely used in computer vision and natural language understanding tasks. In particular, LLaMA3 models have recently been released and have achieved impressive performance in various domains with super-large scale pre-training on over 15T tokens of data. Given the wide application of low-bit quantization for LLMs in resource-constrained scenarios, we explore LLaMA3's capabilities when quantized to low bit-width.

View Article and Find Full Text PDF

Similar Publications

tdCoxSNN: Time-dependent Cox survival neural network for continuous-time dynamic prediction.

J R Stat Soc Ser C Appl Stat

January 2025

Department of Biostatistics and Health Data Science, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA.

Lang Zeng Jipeng Zhang Wei Chen Ying Ding

The aim of dynamic prediction is to provide individualized risk predictions over time, which are updated as new data become available. In pursuit of constructing a dynamic prediction model for a progressive eye disorder, age-related macular degeneration (AMD), we propose a time-dependent Cox survival neural network (tdCoxSNN) to predict its progression using longitudinal fundus images. tdCoxSNN builds upon the time-dependent Cox model by utilizing a neural network to capture the nonlinear effect of time-dependent covariates on the survival outcome.

View Article and Find Full Text PDF

Similar Publications

Diagnosis and prognosis of melanoma from dermoscopy images using machine learning and deep learning: a systematic literature review.

BMC Cancer

January 2025

Department of Data Science, Faculty of Interdisciplinary Science and Technology, Tarbiat Modares University, Tehran, Iran.

Hoda Naseri Ali A Safaei

Background: Melanoma is a highly aggressive skin cancer, where early and accurate diagnosis is crucial to improve patient outcomes. Dermoscopy, a non-invasive imaging technique, aids in melanoma detection but can be limited by subjective interpretation. Recently, machine learning and deep learning techniques have shown promise in enhancing diagnostic precision by automating the analysis of dermoscopy images.

View Article and Find Full Text PDF

Similar Publications

Predictive equation derived from 6,497 doubly labelled water measurements enables the detection of erroneous self-reported energy intake.

Nat Food

January 2025

School of Biological Sciences, University of Aberdeen, Aberdeen, UK.

Rania Bajunaid Chaoqun Niu Catherine Hambly Zongfang Liu Yosuke Yamada

Nutritional epidemiology aims to link dietary exposures to chronic disease, but the instruments for evaluating dietary intake are inaccurate. One way to identify unreliable data and the sources of errors is to compare estimated intakes with the total energy expenditure (TEE). In this study, we used the International Atomic Energy Agency Doubly Labeled Water Database to derive a predictive equation for TEE using 6,497 measures of TEE in individuals aged 4 to 96 years.

View Article and Find Full Text PDF

Similar Publications

Open Soil Spectral Library (OSSL): Building reproducible soil calibration models through open development and community engagement.

PLoS One

January 2025

Woodwell Climate Research Center, Falmouth, MA, United States of America.

José L Safanelli Tomislav Hengl Leandro L Parente Robert Minarik Dellena E Bloom

Soil spectroscopy is a widely used method for estimating soil properties that are important to environmental and agricultural monitoring. However, a bottleneck to its more widespread adoption is the need for establishing large reference datasets for training machine learning (ML) models, which are called soil spectral libraries (SSLs). Similarly, the prediction capacity of new samples is also subject to the number and diversity of soil types and conditions represented in the SSLs.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!