Increasingly, large phylogenomic data sets include transcriptomic data from nonmodel organisms. This not only has allowed controversial and unexplored evolutionary relationships in the tree of life to be addressed but also increases the risk of inadvertent inclusion of paralogs in the analysis. Although this may be expected to result in decreased phylogenetic support, it is not clear if it could also drive highly supported artifactual relationships. Many groups, including the hyperdiverse Lissamphibia, are especially susceptible to these issues due to ancient gene duplication events and small numbers of sequenced genomes and because transcriptomes are increasingly applied to resolve historically conflicting taxonomic hypotheses. We tested the potential impact of paralog inclusion on the topologies and timetree estimates of the Lissamphibia using published and de novo sequencing data including 18 amphibian species, from which 2,656 single-copy gene families were identified. A novel paralog filtering approach resulted in four differently curated data sets, which were used for phylogenetic reconstructions using Bayesian inference, maximum likelihood, and quartet-based supertrees. We found that paralogs drive strongly supported conflicting hypotheses within the Lissamphibia (Batrachia and Procera) and older divergence time estimates even within groups where no variation in topology was observed. All investigated methods, except Bayesian inference with the CAT-GTR model, were found to be sensitive to paralogs, but with filtering convergence to the same answer (Batrachia) was observed. This is the first large-scale study to address the impact of orthology selection using transcriptomic data and emphasizes the importance of quality over quantity particularly for understanding relationships of poorly sampled taxa.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6526904PMC
http://dx.doi.org/10.1093/molbev/msz067DOI Listing

Publication Analysis

Top Keywords

paralog inclusion
8
topologies timetree
8
timetree estimates
8
data sets
8
transcriptomic data
8
bayesian inference
8
data
5
inadvertent paralog
4
inclusion drives
4
drives artifactual
4

Similar Publications

Premise: Target sequence capture (Hyb-Seq) is a cost-effective sequencing strategy that employs RNA probes to enrich for specific genomic sequences. By targeting conserved low-copy orthologs, Hyb-Seq enables efficient phylogenomic investigations. Here, we present Asparagaceae1726-a Hyb-Seq probe set targeting 1726 low-copy nuclear genes for phylogenomics in the angiosperm family Asparagaceae-which will aid the often-challenging delineation and resolution of evolutionary relationships within Asparagaceae.

View Article and Find Full Text PDF

RPL22 is a tumor suppressor in MSI-high cancers and a splicing regulator of MDM4.

Cell Rep

August 2024

Division of Hematology/Oncology, Department of Medicine, Helen Diller Family Comprehensive Cancer Center, Bakar Computational Health Sciences Institute, Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA; Chan Zuckerberg Biohub San Francisco, San Francisco, CA, USA; San Francisco Veterans Affairs Medical Center, San Francisco, CA, USA. Electronic address:

Microsatellite instability-high (MSI-H) tumors are malignant tumors that, despite harboring a high mutational burden, often have intact TP53. One of the most frequent mutations in MSI-H tumors is a frameshift mutation in RPL22, a ribosomal protein. Here, we identified RPL22 as a modulator of MDM4 splicing through an alternative splicing switch in exon 6.

View Article and Find Full Text PDF

We characterized the regulatory mechanisms and role in human myeloid cell survival and differentiation of PRPF40A, a splicing factor lacking a canonical RNA Binding Domain. Upon PRPF40A knockdown, HL-60 cells displayed increased cell death, decreased proliferation and slight differentiation phenotype with upregulation of immune activation genes. Suggestive of both redundant and specific functions, cell death but not proliferation was rescued by overexpression of its paralog PRPF40B.

View Article and Find Full Text PDF

Open-Source Bioinformatic Pipeline to Improve PMS2 Genetic Testing Using Short-Read NGS Data.

J Mol Diagn

August 2024

Hereditary Cancer Program, Catalan Institute of Oncology, L'Hospitalet de Llobregat, Spain; Hereditary Cancer Group, Molecular Mechanisms and Experimental Therapy in Oncology Program, Institut d'Investigació Biomèdica de Bellvitge, L'Hospitalet de Llobregat, Spain; Ciber Oncología, Instituto Salud Carlos III, Madrid, Spain. Electronic address:

The molecular diagnosis of mismatch repair-deficient cancer syndromes is hampered by difficulties in sequencing the PMS2 gene, mainly owing to the PMS2CL pseudogene. Next-generation sequencing short reads cannot be mapped unambiguously by standard pipelines, compromising variant calling accuracy. This study aimed to provide a refined bioinformatic pipeline for PMS2 mutational analysis and explore PMS2 germline pathogenic variant prevalence in an unselected hereditary cancer (HC) cohort.

View Article and Find Full Text PDF

Autophagy is a pivotal regulatory and catabolic process, induced under various stressful conditions, including hypoxia. However, little is known about alternative splicing of autophagy genes in the hypoxic landscape in breast cancer. Our research unravels the hitherto unreported alternative splicing of BNIP3L, a crucial hypoxia-induced autophagic gene.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!