Publications by authors named "Ravenzwaaij D"

Replication and the reported crises impacting many fields of research have become a focal point for the sciences. This has led to reforms in publishing, methodological design and reporting, and increased numbers of experimental replications coordinated across many laboratories. While replication is rightly considered an indispensable tool of science, financial resources and researchers' time are quite limited.

View Article and Find Full Text PDF

Background: The importance of replication in the social and behavioural sciences has been emphasized for decades. Various frequentist and Bayesian approaches have been proposed to qualify a replication study as successful or unsuccessful. One of them is meta-analysis.

View Article and Find Full Text PDF

Linde et al. (2021) compared the "two one-sided tests" the "highest density interval-region of practical equivalence", and the "interval Bayes factor" approaches to establishing equivalence in terms of power and Type I error rate using typical decision thresholds. They found that the interval Bayes factor approach exhibited a higher power but also a higher Type I error rate than the other approaches.

View Article and Find Full Text PDF

Medicine regulators need to judge whether a drug's favorable effects outweigh its unfavorable effects based on a dossier submitted by an applicant, such as a pharmaceutical company. Because scientific knowledge is inherently uncertain, regulators also need to judge the credibility of these effects by identifying and evaluating uncertainties. We performed an ethnographic study of assessment procedures at the Dutch Medicines Evaluation Board (MEB) and describe how regulators evaluate the credibility of an applicant's claims about the benefits and risks of a drug in practice.

View Article and Find Full Text PDF
Article Synopsis
  • Many-analyst studies investigate how well different analysis teams can interpret the same dataset and how robust their conclusions are against alternative methods.
  • Typically, these studies only report one outcome measure, like effect size, making it hard to grasp the full impact of different analysis choices.
  • To address this, researchers created the Subjective Evidence Evaluation Survey (SEES) using feedback from experts, helping to evaluate the quality of research design and evidence strength, ultimately offering a deeper understanding of analysis outcomes.
View Article and Find Full Text PDF

Objectives: To quantify the strength of statistical evidence of randomized controlled trials (RCTs) for novel cancer drugs approved by the Food and Drug Administration in the last 2 decades.

Study Design And Setting: We used data on overall survival (OS), progression-free survival, and tumor response for novel cancer drugs approved for the first time by the Food and Drug Administration between January 2000 and December 2020. We assessed strength of statistical evidence by calculating Bayes factors (BFs) for all available endpoints, and we pooled evidence using Bayesian fixed-effect meta-analysis for indications approved based on 2 RCTs.

View Article and Find Full Text PDF

A large amount of scientific literature in social and behavioural sciences bases their conclusions on one or more hypothesis tests. As such, it is important to obtain more knowledge about how researchers in social and behavioural sciences interpret quantities that result from hypothesis test metrics, such as p-values and Bayes factors. In the present study, we explored the relationship between obtained statistical evidence and the degree of belief or confidence that there is a positive effect in the population of interest.

View Article and Find Full Text PDF

Background: Clinical trials often seek to determine the superiority, equivalence, or non-inferiority of an experimental condition (e.g., a new drug) compared to a control condition (e.

View Article and Find Full Text PDF

Background: Publishing study results in scientific journals has been the standard way of disseminating science. However, getting results published may depend on their statistical significance. The consequence of this is that the representation of scientific knowledge might be biased.

View Article and Find Full Text PDF

Theoretical arguments and empirical investigations indicate that a high proportion of published findings do not replicate and are likely false. The current position paper provides a broad perspective on which may lead to replication failures. This broad perspective focuses on reform history and on opportunities for future reform.

View Article and Find Full Text PDF

Increased execution of replication studies contributes to the effort to restore credibility of empirical research. However, a second generation of problems arises: the number of potential replication targets is at a serious mismatch with available resources. Given limited resources, replication target selection should be well-justified, systematic and transparently communicated.

View Article and Find Full Text PDF

As replications of individual studies are resource intensive, techniques for predicting the replicability are required. We introduce the repliCATS (Collaborative Assessments for Trustworthy Science) process, a new method for eliciting expert predictions about the replicability of research. This process is a structured expert elicitation approach based on a modified Delphi technique applied to the evaluation of research claims in social and behavioural sciences.

View Article and Find Full Text PDF

The last 25 years have shown a steady increase in attention for the Bayes factor as a tool for hypothesis evaluation and model selection. The present review highlights the potential of the Bayes factor in psychological research. We discuss six types of applications: Bayesian evaluation of point null, interval, and informative hypotheses, Bayesian evidence synthesis, Bayesian variable selection and model averaging, and Bayesian evaluation of cognitive models.

View Article and Find Full Text PDF

Tendeiro and Kiers (2019) provide a detailed and scholarly critique of Null Hypothesis Bayesian Testing (NHBT) and its central component-the Bayes factor-that allows researchers to update knowledge and quantify statistical evidence. Tendeiro and Kiers conclude that NHBT constitutes an improvement over frequentist -values, but primarily elaborate on a list of 11 "issues" of NHBT. We believe that several issues identified by Tendeiro and Kiers are of central importance for elucidating the complementary roles of hypothesis testing versus parameter estimation and for appreciating the virtue of statistical thinking over conducting statistical rituals.

View Article and Find Full Text PDF

We argue that statistical practice in the social and behavioural sciences benefits from transparency, a fair acknowledgement of uncertainty and openness to alternative interpretations. Here, to promote such a practice, we recommend seven concrete statistical procedures: (1) visualizing data; (2) quantifying inferential uncertainty; (3) assessing data preprocessing choices; (4) reporting multiple models; (5) involving multiple analysts; (6) interpreting results modestly; and (7) sharing data and code. We discuss their benefits and limitations, and provide guidelines for adoption.

View Article and Find Full Text PDF

Any large dataset can be analyzed in a number of ways, and it is possible that the use of different analysis strategies will lead to different results and conclusions. One way to assess whether the results obtained depend on the analysis strategy chosen is to employ multiple analysts and leave each of them free to follow their own approach. Here, we present consensus-based guidance for conducting and reporting such multi-analyst studies, and we discuss how broader adoption of the multi-analyst approach has the potential to strengthen the robustness of results and conclusions obtained from analyses of datasets in basic and applied research.

View Article and Find Full Text PDF

Some important research questions require the ability to find evidence for two conditions being practically equivalent. This is impossible to accomplish within the traditional frequentist null hypothesis significance testing framework; hence, other methodologies must be utilized. We explain and illustrate three approaches for finding evidence for equivalence: The frequentist two one-sided tests procedure, the Bayesian highest density interval region of practical equivalence procedure, and the Bayes factor interval null procedure.

View Article and Find Full Text PDF

Approval and prescription of psychotropic drugs should be informed by the strength of evidence for efficacy. Using a Bayesian framework, we examined (1) whether psychotropic drugs are supported by substantial evidence (at the time of approval by the Food and Drug Administration), and (2) whether there are systematic differences across drug groups. Data from short-term, placebo-controlled phase II/III clinical trials for 15 antipsychotics, 16 antidepressants for depression, nine antidepressants for anxiety, and 20 drugs for attention deficit hyperactivity disorder (ADHD) were extracted from FDA reviews.

View Article and Find Full Text PDF

Current discussions on improving the reproducibility of science often revolve around statistical innovations. However, equally important for improving methodological rigour is a valid operationalization of phenomena. Operationalization is the process of translating theoretical constructs into measurable laboratory quantities.

View Article and Find Full Text PDF

Structured protocols offer a transparent and systematic way to elicit and combine/aggregate, probabilistic predictions from multiple experts. These judgements can be aggregated behaviourally or mathematically to derive a final group prediction. Mathematical rules (e.

View Article and Find Full Text PDF
Article Synopsis
  • Remdesivir, authorized for treating COVID-19 in the USA and Europe, had initial approvals based on two clinical trials, while a third study by Wang et al. was inconclusive due to low power.
  • Bayesian reanalysis showed mixed evidence regarding clinical improvement and moderate evidence against reduced mortality from remdesivir treatment, supported by additional studies post-approval.
  • Regulatory bodies should consider all evidence, utilizing Bayesian methods to interpret non-significant results, especially when clinical efficacy data is limited.
View Article and Find Full Text PDF

The practice of sequentially testing a null hypothesis as data are collected until the null hypothesis is rejected is known as optional stopping. It is well known that optional stopping is problematic in the context of p value-based null hypothesis significance testing: The false-positive rates quickly overcome the single test's significance level. However, the state of affairs under null hypothesis Bayesian testing, where p values are replaced by Bayes factors, has perhaps surprisingly been much less consensual.

View Article and Find Full Text PDF

To overcome the frequently debated crisis of confidence, replicating studies is becoming increasingly more common. Multiple frequentist and Bayesian measures have been proposed to evaluate whether a replication is successful, but little is known about which method best captures replication success. This study is one of the first attempts to compare a number of quantitative measures of replication success with respect to their ability to draw the correct inference when the underlying truth is known, while taking publication bias into account.

View Article and Find Full Text PDF

The crisis of confidence has undermined the trust that researchers place in the findings of their peers. In order to increase trust in research, initiatives such as preregistration have been suggested, which aim to prevent various questionable research practices. As it stands, however, no empirical evidence exists that preregistration does increase perceptions of trust.

View Article and Find Full Text PDF