Species trees have traditionally been inferred from a few selected markers, and genome-wide investigations remain largely restricted to model organisms or small groups of species for which sampling of fresh material is available, leaving out most of the existing and historical species diversity. The genomes of an increasing number of species, including specimens extracted from natural history collections, are being sequenced at low depth. While these data sets are widely used to analyse organelle genomes, the nuclear fraction is generally ignored. Here we evaluate different reference-based methods to infer phylogenies of large taxonomic groups from such data sets. Using the example of the Oleeae tribe, a worldwide-distributed group, we build phylogenies based on single nucleotide polymorphisms (SNPs) obtained using two reference genomes (the olive and ash trees). The inferred phylogenies are overall congruent, yet present differences that might reflect the effect of distance to the reference on the amount of missing data. To limit this issue, genome complexity was reduced by using pairs of orthologous coding sequences as the reference, thus allowing us to combine SNPs obtained using two distinct references. Concatenated and coalescence trees based on these combined SNPs suggest events of incomplete lineage sorting and/or hybridization during the diversification of this large phylogenetic group. Our results show that genome-wide phylogenetic trees can be inferred from low-depth sequence data sets for eukaryote groups with complex genomes, and histories of reticulate evolution. This opens new avenues for large-scale phylogenomics and biogeographical analyses covering both the extant and the historical diversity stored in museum collections.

Download full-text PDF

Source
http://dx.doi.org/10.1111/1755-0998.13016DOI Listing

Publication Analysis

Top Keywords

data sets
12
trees inferred
8
phylogenomics low-depth
4
low-depth genome
4
genome sequencing
4
sequencing case
4
case study
4
study olive
4
olive tribe
4
species
4

Similar Publications

The PRIDE database is the largest public data repository of mass spectrometry-based proteomics data and currently stores more than 40,000 data sets covering a wide range of organisms, experimental techniques, and biological conditions. During the past few years, PRIDE has seen a significant increase in the amount of submitted data-independent acquisition (DIA) proteomics data sets. This provides an excellent opportunity for large-scale data reanalysis and reuse.

View Article and Find Full Text PDF

The admixture model is widely applied to estimate and interpret population structure among individuals. Here we consider a "standard admixture" model that assumes the admixed populations are unrelated and also a generalized model, where the admixed populations themselves are related via coancestry (or covariance) of allele frequencies. The generalized model yields a potentially more realistic and substantially more flexible model that we call "super admixture".

View Article and Find Full Text PDF

NEBULA101: an open dataset for the study of language aptitude in behaviour, brain structure and function.

Sci Data

January 2025

Brain and Language Lab, Department of Psychology, Faculty of Psychology and Education Science, University of Geneva, Geneva, Switzerland.

This paper introduces the "NEBULA101 - Neuro-behavioural Understanding of Language Aptitude" dataset, which comprises behavioural and brain imaging data from 101 healthy adults to examine individual differences in language and cognition. Human language, a multifaceted behaviour, varies significantly among individuals, at different processing levels. Recent advances in cognitive science have embraced an integrated approach, combining behavioural and brain studies to explore these differences comprehensively.

View Article and Find Full Text PDF

Semisupervised Contrastive Learning for Bioactivity Prediction Using Cell Painting Image Data.

J Chem Inf Model

January 2025

Research Unit Structural Chemistry and Computational Biophysics, Leibniz-Forschungsinstitut für Molekulare Pharmakologie, Berlin 13125, Germany.

Morphological profiling has recently demonstrated remarkable potential for identifying the biological activities of small molecules. Alongside the fully supervised and self-supervised machine learning methods recently proposed for bioactivity prediction from Cell Painting image data, we introduce here a semisupervised contrastive (SemiSupCon) learning approach. This approach combines the strengths of using biological annotations in supervised contrastive learning and leveraging large unannotated image data sets with self-supervised contrastive learning.

View Article and Find Full Text PDF

Generalized, sublethal damage-based mathematical approach for improved modeling of clonogenic survival curve flattening upon hyperthermia, radiotherapy, and beyond.

Phys Med Biol

January 2025

OncoRay - National Center for Radiation Research in Oncology, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Helmholtz-Zentrum Dresden - Rossendorf, Dresden, Sachsen, 01307, GERMANY.

Mathematical modeling can offer valuable insights into the behavior of biological systems upon treatment. Different mathematical models (empirical, semi-empirical, and mechanistic) have been designed to predict the efficacy of either hyperthermia (HT), radiotherapy (RT), or their combination. However, mathematical approaches capable of modeling cell survival from shared general principles for both mono-treatments alone and their co-application are rare.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!