DEPP: Deep Learning Enables Extending Species Trees using Single Genes.

Syst Biol

Department of Electrical and Computer Engineering, UC San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA.

Published: May 2023

Placing new sequences onto reference phylogenies is increasingly used for analyzing environmental samples, especially microbiomes. Existing placement methods assume that query sequences have evolved under specific models directly on the reference phylogeny. For example, they assume single-gene data (e.g., 16S rRNA amplicons) have evolved under the GTR model on a gene tree. Placement, however, often has a more ambitious goal: extending a (genome-wide) species tree given data from individual genes without knowing the evolutionary model. Addressing this challenging problem requires new directions. Here, we introduce Deep-learning Enabled Phylogenetic Placement (DEPP), an algorithm that learns to extend species trees using single genes without prespecified models. In simulations and on real data, we show that DEPP can match the accuracy of model-based methods without any prior knowledge of the model. We also show that DEPP can update the multilocus microbial tree-of-life with single genes with high accuracy. We further demonstrate that DEPP can combine 16S and metagenomic data onto a single tree, enabling community structure analyses that take advantage of both sources of data. [Deep learning; gene tree discordance; metagenomics; microbiome analyses; neural networks; phylogenetic placement.].

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10198656PMC
http://dx.doi.org/10.1093/sysbio/syac031DOI Listing

Publication Analysis

Top Keywords

single genes
12
species trees
8
trees single
8
gene tree
8
depp
5
data
5
depp deep
4
deep learning
4
learning enables
4
enables extending
4

Similar Publications

Two pathogen-inducible UDP-glycosyltransferases, UGT73C3 and UGT73C4, catalyze the glycosylation of pinoresinol to promote plant immunity in Arabidopsis.

Plant Commun

January 2025

The Key Laboratory of Plant Development and Environmental Adaptation Biology, Ministry of Education; Shandong Key Laboratory of Precision Molecular Crop Design and Breeding; School of Life Sciences, Shandong University, Qingdao 266237, China. Electronic address:

UDP-glycosyltransferases (UGTs) constitute the largest glycosyltransferase family in the plant kingdom. They are responsible for transferring sugar moieties onto various small molecules to control many metabolic processes. However, their physiological significance in plants is largely unknown.

View Article and Find Full Text PDF

, a medicinal herbaceous plant documented in the Chinese Pharmacopoeia, is a promising candidate for research into plant-derived pharmaceuticals. However, the study of newly emerging viruses that threaten the cultivation of remains limited. In this study, plants exhibiting symptoms such as leaf yellowing, mottled leaves, and vein chlorosis were collected and subjected to RNA sequencing to identify potential viral pathogens.

View Article and Find Full Text PDF

Bombyx mori bidensovirus (BmBDV), a significant pathogen in the sericulture industry, holds a unique taxonomic position due to its distinct segmented single-stranded DNA (ssDNA) genome and the presence of a self-encoding DNA polymerase. However, the functions of viral non-structural proteins, such as NS2, remain unknown. This protein is hypothesized to play a role in viral replication and pathogenesis.

View Article and Find Full Text PDF

Patterns of Isoform Variation for N Gene Subgenomic mRNAs in Betacoronavirus Transcriptomes.

Viruses

December 2024

Department of Biology, Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA.

The nucleocapsid (N) protein is the most expressed protein in later stages of SARS-CoV-2 infection with several important functions. It is translated from a subgenomic mRNA (sgmRNA) formed by template switching during transcription. A recently described translation initiation site (TIS) with a CTG codon in the leader sequence (TIS-L) is out of frame with most structural and accessory genes including the N gene and may act as a translation suppressor.

View Article and Find Full Text PDF

The global number of COVID-19 deaths has reached 7 million, with 4% of these deaths occurring in children and adolescents. In Brazil, around 1500 children up to 11 years old died from the disease. The most common symptoms in children are respiratory, potentially progressing to severe illnesses, such as severe acute respiratory syndrome (SARS) and MIS-C.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!