Allocation strategies improve the efficiency of crowdsourcing by decreasing the work needed to complete individual tasks accurately. However, these algorithms introduce bias by preferentially allocating workers onto easy tasks, leading to sets of completed tasks that are no longer representative of all tasks. This bias challenges inference of problem-wide properties such as typical task difficulty or crowd properties such as worker completion times, important information that goes beyond the crowd responses themselves. Here we study inference about problem properties when using an allocation algorithm to improve crowd efficiency. We introduce Decision-Explicit Probability Sampling (DEPS), a novel method to perform inference of problem properties while accounting for the potential bias introduced by an allocation strategy. Experiments on real and synthetic crowdsourcing data show that DEPS outperforms baseline inference methods while still leveraging the efficiency gains of the allocation method. The ability to perform accurate inference of general properties when using non-representative data allows crowdsourcers to extract more knowledge out of a given crowdsourced dataset.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9046272 | PMC |
http://dx.doi.org/10.1038/s41598-022-10794-9 | DOI Listing |
Biol Methods Protoc
January 2025
Department of Physics, George Washington University, Washington, DC 20052, United States.
A mixture-of-experts (MoE) approach has been developed to mitigate the poor out-of-distribution (OOD) generalization of deep learning (DL) models for single-sequence-based prediction of RNA secondary structure. The main idea behind this approach is to use DL models for in-distribution (ID) test sequences to leverage their superior ID performances, while relying on physics-based models for OOD sequences to ensure robust predictions. One key ingredient of the pipeline, named MoEFold2D, is automated ID/OOD detection via consensus analysis of an ensemble of DL model predictions without requiring access to training data during inference.
View Article and Find Full Text PDFJ Appl Stat
May 2024
Department of Mathematics, Brunel University London, Uxbridge, UK.
Although the fractional polynomials (FPs) can act as a concise and accurate formula for examining smooth relationships between response and predictors, modelling conditional mean functions observes the partial view of a distribution of response variable, as distributions of many response variables such as blood pressure (BP) measures are typically skew. Conditional quantile functions with FPs provide a comprehensive relationship between the response variable and its predictors, such as median and extremely high-BP measures that may be often required in practical data analysis generally. To the best of our knowledge, this is new in the literature.
View Article and Find Full Text PDFJ R Stat Soc Ser A Stat Soc
January 2025
Division of Cancer Epidemiology & Genetics, National Cancer Institute, Biostatistics Branch, Rockville, USA.
Accurate cancer risk estimation is crucial to clinical decision-making, such as identifying high-risk people for screening. However, most existing cancer risk models incorporate data from epidemiologic studies, which usually cannot represent the target population. While population-based health surveys are ideal for making inference to the target population, they typically do not collect time-to-cancer incidence data.
View Article and Find Full Text PDFSci Rep
January 2025
INES Integrated Environmental Solutions UG, Wilhelmshaven, Germany.
Hydrothermal vents are ecosystems inhabited by a highly specialized fauna. To date, more than 30 gastropod species have been recorded from vent fields along the Central and Southeast Indian Ridge and all of them are assumed to be vent-endemic. During the INDEX project, 701 representatives of the genus Anatoma (Mollusca: Vetigastropoda) were sampled from six abyssal hydrothermal vent fields.
View Article and Find Full Text PDFPoult Sci
January 2025
College of Mathematics Informatics, South China Agricultural University, Guangzhou 510642, China; Key Laboratory of Smart Agricultural Technology in Tropical South China, Ministry of Agriculture and Rural Affairs, Guangzhou 510642, China; Guangdong Engineering Research Center of Agricultural Big Data, Guangzhou 510642, China. Electronic address:
Accurate individual egg-laying detection is crucial for eliminating low-yielding breeder ducks and improving production efficiency. However, existing methods are often expensive and require strict environmental conditions. This study proposes a data processing method based on wearable sensors and joint time-frequency representation (TFR), aimed at accurately identifying egg-laying in ducks.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!