Replication and the reported crises impacting many fields of research have become a focal point for the sciences. This has led to reforms in publishing, methodological design and reporting, and increased numbers of experimental replications coordinated across many laboratories. While replication is rightly considered an indispensable tool of science, financial resources and researchers' time are quite limited.
View Article and Find Full Text PDFBackground: The importance of replication in the social and behavioural sciences has been emphasized for decades. Various frequentist and Bayesian approaches have been proposed to qualify a replication study as successful or unsuccessful. One of them is meta-analysis.
View Article and Find Full Text PDFLinde et al. (2021) compared the "two one-sided tests" the "highest density interval-region of practical equivalence", and the "interval Bayes factor" approaches to establishing equivalence in terms of power and Type I error rate using typical decision thresholds. They found that the interval Bayes factor approach exhibited a higher power but also a higher Type I error rate than the other approaches.
View Article and Find Full Text PDFMedicine regulators need to judge whether a drug's favorable effects outweigh its unfavorable effects based on a dossier submitted by an applicant, such as a pharmaceutical company. Because scientific knowledge is inherently uncertain, regulators also need to judge the credibility of these effects by identifying and evaluating uncertainties. We performed an ethnographic study of assessment procedures at the Dutch Medicines Evaluation Board (MEB) and describe how regulators evaluate the credibility of an applicant's claims about the benefits and risks of a drug in practice.
View Article and Find Full Text PDFObjectives: To quantify the strength of statistical evidence of randomized controlled trials (RCTs) for novel cancer drugs approved by the Food and Drug Administration in the last 2 decades.
Study Design And Setting: We used data on overall survival (OS), progression-free survival, and tumor response for novel cancer drugs approved for the first time by the Food and Drug Administration between January 2000 and December 2020. We assessed strength of statistical evidence by calculating Bayes factors (BFs) for all available endpoints, and we pooled evidence using Bayesian fixed-effect meta-analysis for indications approved based on 2 RCTs.
A large amount of scientific literature in social and behavioural sciences bases their conclusions on one or more hypothesis tests. As such, it is important to obtain more knowledge about how researchers in social and behavioural sciences interpret quantities that result from hypothesis test metrics, such as p-values and Bayes factors. In the present study, we explored the relationship between obtained statistical evidence and the degree of belief or confidence that there is a positive effect in the population of interest.
View Article and Find Full Text PDFBackground: Clinical trials often seek to determine the superiority, equivalence, or non-inferiority of an experimental condition (e.g., a new drug) compared to a control condition (e.
View Article and Find Full Text PDFBackground: Publishing study results in scientific journals has been the standard way of disseminating science. However, getting results published may depend on their statistical significance. The consequence of this is that the representation of scientific knowledge might be biased.
View Article and Find Full Text PDFTheoretical arguments and empirical investigations indicate that a high proportion of published findings do not replicate and are likely false. The current position paper provides a broad perspective on which may lead to replication failures. This broad perspective focuses on reform history and on opportunities for future reform.
View Article and Find Full Text PDFIncreased execution of replication studies contributes to the effort to restore credibility of empirical research. However, a second generation of problems arises: the number of potential replication targets is at a serious mismatch with available resources. Given limited resources, replication target selection should be well-justified, systematic and transparently communicated.
View Article and Find Full Text PDFAs replications of individual studies are resource intensive, techniques for predicting the replicability are required. We introduce the repliCATS (Collaborative Assessments for Trustworthy Science) process, a new method for eliciting expert predictions about the replicability of research. This process is a structured expert elicitation approach based on a modified Delphi technique applied to the evaluation of research claims in social and behavioural sciences.
View Article and Find Full Text PDFThe last 25 years have shown a steady increase in attention for the Bayes factor as a tool for hypothesis evaluation and model selection. The present review highlights the potential of the Bayes factor in psychological research. We discuss six types of applications: Bayesian evaluation of point null, interval, and informative hypotheses, Bayesian evidence synthesis, Bayesian variable selection and model averaging, and Bayesian evaluation of cognitive models.
View Article and Find Full Text PDFTendeiro and Kiers (2019) provide a detailed and scholarly critique of Null Hypothesis Bayesian Testing (NHBT) and its central component-the Bayes factor-that allows researchers to update knowledge and quantify statistical evidence. Tendeiro and Kiers conclude that NHBT constitutes an improvement over frequentist -values, but primarily elaborate on a list of 11 "issues" of NHBT. We believe that several issues identified by Tendeiro and Kiers are of central importance for elucidating the complementary roles of hypothesis testing versus parameter estimation and for appreciating the virtue of statistical thinking over conducting statistical rituals.
View Article and Find Full Text PDFWe argue that statistical practice in the social and behavioural sciences benefits from transparency, a fair acknowledgement of uncertainty and openness to alternative interpretations. Here, to promote such a practice, we recommend seven concrete statistical procedures: (1) visualizing data; (2) quantifying inferential uncertainty; (3) assessing data preprocessing choices; (4) reporting multiple models; (5) involving multiple analysts; (6) interpreting results modestly; and (7) sharing data and code. We discuss their benefits and limitations, and provide guidelines for adoption.
View Article and Find Full Text PDFAny large dataset can be analyzed in a number of ways, and it is possible that the use of different analysis strategies will lead to different results and conclusions. One way to assess whether the results obtained depend on the analysis strategy chosen is to employ multiple analysts and leave each of them free to follow their own approach. Here, we present consensus-based guidance for conducting and reporting such multi-analyst studies, and we discuss how broader adoption of the multi-analyst approach has the potential to strengthen the robustness of results and conclusions obtained from analyses of datasets in basic and applied research.
View Article and Find Full Text PDFSome important research questions require the ability to find evidence for two conditions being practically equivalent. This is impossible to accomplish within the traditional frequentist null hypothesis significance testing framework; hence, other methodologies must be utilized. We explain and illustrate three approaches for finding evidence for equivalence: The frequentist two one-sided tests procedure, the Bayesian highest density interval region of practical equivalence procedure, and the Bayes factor interval null procedure.
View Article and Find Full Text PDFApproval and prescription of psychotropic drugs should be informed by the strength of evidence for efficacy. Using a Bayesian framework, we examined (1) whether psychotropic drugs are supported by substantial evidence (at the time of approval by the Food and Drug Administration), and (2) whether there are systematic differences across drug groups. Data from short-term, placebo-controlled phase II/III clinical trials for 15 antipsychotics, 16 antidepressants for depression, nine antidepressants for anxiety, and 20 drugs for attention deficit hyperactivity disorder (ADHD) were extracted from FDA reviews.
View Article and Find Full Text PDFCurrent discussions on improving the reproducibility of science often revolve around statistical innovations. However, equally important for improving methodological rigour is a valid operationalization of phenomena. Operationalization is the process of translating theoretical constructs into measurable laboratory quantities.
View Article and Find Full Text PDFStructured protocols offer a transparent and systematic way to elicit and combine/aggregate, probabilistic predictions from multiple experts. These judgements can be aggregated behaviourally or mathematically to derive a final group prediction. Mathematical rules (e.
View Article and Find Full Text PDFThe practice of sequentially testing a null hypothesis as data are collected until the null hypothesis is rejected is known as optional stopping. It is well known that optional stopping is problematic in the context of p value-based null hypothesis significance testing: The false-positive rates quickly overcome the single test's significance level. However, the state of affairs under null hypothesis Bayesian testing, where p values are replaced by Bayes factors, has perhaps surprisingly been much less consensual.
View Article and Find Full Text PDFTo overcome the frequently debated crisis of confidence, replicating studies is becoming increasingly more common. Multiple frequentist and Bayesian measures have been proposed to evaluate whether a replication is successful, but little is known about which method best captures replication success. This study is one of the first attempts to compare a number of quantitative measures of replication success with respect to their ability to draw the correct inference when the underlying truth is known, while taking publication bias into account.
View Article and Find Full Text PDFThe crisis of confidence has undermined the trust that researchers place in the findings of their peers. In order to increase trust in research, initiatives such as preregistration have been suggested, which aim to prevent various questionable research practices. As it stands, however, no empirical evidence exists that preregistration does increase perceptions of trust.
View Article and Find Full Text PDF