Publications by authors named "W A Wilbur"

Summary: Over 55% of author names in PubMed are ambiguous: the same name is shared by different individual researchers. This poses significant challenges on precise literature retrieval for author name queries, a common behavior in biomedical literature search. In response, we present a comprehensive dataset of disambiguated authors.

View Article and Find Full Text PDF

Motivation: Information retrieval (IR) is essential in biomedical knowledge acquisition and clinical decision support. While recent progress has shown that language model encoders perform better semantic retrieval, training such models requires abundant query-article annotations that are difficult to obtain in biomedicine. As a result, most biomedical IR systems only conduct lexical matching.

View Article and Find Full Text PDF

A significant percentage of COVID-19 survivors experience ongoing multisystemic symptoms that often affect daily living, a condition known as Long Covid or post-acute-sequelae of SARS-CoV-2 infection. However, identifying scientific articles relevant to Long Covid is challenging since there is no standardized or consensus terminology. We developed an iterative human-in-the-loop machine learning framework combining data programming with active learning into a robust ensemble model, demonstrating higher specificity and considerably higher sensitivity than other methods.

View Article and Find Full Text PDF

Objective: A significant number of recent articles in PubMed have full text available in PubMed Central®, and the availability of full texts has been consistently growing. However, it is not currently possible for a user to simultaneously query the contents of both databases and receive a single integrated search result. In this study, we investigate how to score full text articles given a multitoken query and how to combine those full text article scores with scores originating from abstracts and achieve an overall improved retrieval performance.

View Article and Find Full Text PDF

To inform continuous and rigorous reflection about the description of human populations in genomics research, this study investigates the historical and contemporary use of the terms "ancestry," "ethnicity," "race," and other population labels in The American Journal of Human Genetics from 1949 to 2018. We characterize these terms' frequency of use and assess their odds of co-occurrence with a set of social and genetic topical terms. Throughout The Journal's 70-year history, "ancestry" and "ethnicity" have increased in use, appearing in 33% and 26% of articles in 2009-2018, while the use of "race" has decreased, occurring in 4% of articles in 2009-2018.

View Article and Find Full Text PDF