Recommendations for improving statistical inference in population genomics.

Parul Johri Charles F Aquadro Mark Beaumont Brian Charlesworth Laurent Excoffier Adam Eyre-Walker Peter D Keightley Michael Lynch Gil McVean Bret A Payseur Susanne P Pfeifer Wolfgang Stephan Jeffrey D Jensen

PLoS Biol

School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America.

Published: May 2022

The field of population genomics has grown rapidly in response to the recent advent of affordable, large-scale sequencing technologies. As opposed to the situation during the majority of the 20th century, in which the development of theoretical and statistical population genetic insights outpaced the generation of data to which they could be applied, genomic data are now being produced at a far greater rate than they can be meaningfully analyzed and interpreted. With this wealth of data has come a tendency to focus on fitting specific (and often rather idiosyncratic) models to data, at the expense of a careful exploration of the range of possible underlying evolutionary processes. For example, the approach of directly investigating models of adaptive evolution in each newly sequenced population or species often neglects the fact that a thorough characterization of ubiquitous nonadaptive processes is a prerequisite for accurate inference. We here describe the perils of these tendencies, present our consensus views on current best practices in population genomic data analysis, and highlight areas of statistical inference and theory that are in need of further attention. Thereby, we argue for the importance of defining a biologically relevant baseline model tuned to the details of each new analysis, of skepticism and scrutiny in interpreting model fitting results, and of carefully defining addressable hypotheses and underlying uncertainties.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9154105	PMC
http://dx.doi.org/10.1371/journal.pbio.3001669	DOI Listing

Publication Analysis

Top Keywords

statistical inference

population genomics

genomic data

population

data

recommendations improving

improving statistical

inference population

genomics field

field population

Similar Publications

Cross-Sectoral Comparisons of Process Quality Indicators of Health Care Across Residential Regions Using Restricted Mean Survival Time.

Med Care

November 2024

Institute of Clinical Biometrics, Center for Medical Data Science, Medical University of Vienna, Vienna, Austria.

Hana Šinkovec Walter Gall Georg Heinze

Background: Practice guidelines recommend patient management based on scientific evidence. Quality indicators gauge adherence to such recommendations and assess health care quality. They are usually defined as adverse event rates, which may not fully capture guideline adherence over time.

View Article and Find Full Text PDF

Similar Publications

Bayesian semiparametric inference in longitudinal metabolomics data.

Sci Rep

December 2024

Department of Statistical Science, Duke University, Durham, 27708-0251, USA.

Abhra Sarkar Ornella Cominetti Ivan Montoliu Joanne Hosking Jonathan Pinkney

The article is motivated by an application to the EarlyBird cohort study aiming to explore how anthropometrics and clinical and metabolic processes are associated with obesity and glucose control during childhood. There is interest in inferring the relationship between dynamically changing and high-dimensional metabolites and a longitudinal response. Important aspects of the analysis include the selection of the important set of metabolites and the accommodation of missing data in both response and covariate values.

View Article and Find Full Text PDF

Similar Publications

Identification of confounders and estimating the causal effect of place of birth on age-specific childhood vaccination.

BMC Med Inform Decis Mak

December 2024

School of Mathematics, Statistics & Computer Science, University of KwaZulu Natal, Durban, South Africa.

Ashagrie Sharew Iyassu Haile Mekonnen Fenta Zelalem G Dessie Temesgen T Zewotir

Background: In causal analyses, some third factor may distort the relationship between the exposure and the outcome variables under study, which gives spurious results. In this case, treatment groups and control groups that receive and do not receive the exposure are different from one another in some other essential variables, called confounders.

Method: Place of birth was used as exposure variable and age-specific childhood vaccination status was used as outcome variables.

View Article and Find Full Text PDF

Similar Publications

The necessity of validity diagnostics when drawing causal inferences from observational data: lessons from a multi-database evaluation of the risk of non-infectious uveitis among patients exposed to Remicade.

BMC Med Res Methodol

December 2024

Janssen Research & Development LLC, Global Epidemiology Organization, Raritan, NJ, USA.

James Weaver Erica A Voss Guy Cafri Kathleen Beyrau Michelle Nashleanas

Background: Autoimmune disorders have primary manifestations such as joint pain and bowel inflammation but can also have secondary manifestations such as non-infectious uveitis (NIU). A regulatory health authority raised concerns after receiving spontaneous reports for NIU following exposure to Remicade, a biologic therapy with multiple indications for which alternative therapies are available. In assessment of this clinical question, we applied validity diagnostics to support observational data causal inferences.

View Article and Find Full Text PDF

Similar Publications

Cross-species regulatory network analysis identifies FOXO1 as a driver of ovarian follicular recruitment.

Sci Rep

December 2024

Departments of Animal and Food Sciences, Biological Sciences, Medical and Molecular Sciences, and Microbiology Graduate Program, University of Delaware, Newark, DE, USA.

Ashley E Kramer Alberto Berral-González Kathryn M Ellwood Shanshan Ding Javier De Las Rivas

The transcriptional regulation of gene expression in the latter stages of follicular development in laying hen ovarian follicles is not well understood. Although differentially expressed genes (DEGs) have been identified in pre-recruitment and pre-ovulatory stages, the master regulators driving these DEGs remain unknown. This study addresses this knowledge gap by utilizing Master Regulator Analysis (MRA) combined with the Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe) for the first time in laying hen research to identify master regulators that are controlling DEGs in pre-recruitment and pre-ovulatory phases.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!