Current variant calling (VC) approaches have been designed to leverage populations of long-range haplotypes and were benchmarked using populations of European descent, whereas most genetic diversity is found in non-European such as Africa populations. Working with these genetically diverse populations, VC tools may produce false positive and false negative results, which may produce misleading conclusions in prioritization of mutations, clinical relevancy and actionability of genes. The most prominent question is which tool or pipeline has a high rate of sensitivity and precision when analysing African data with either low or high sequence coverage, given the high genetic diversity and heterogeneity of this data. Here, a total of 100 synthetic Whole Genome Sequencing (WGS) samples, mimicking the genetics profile of African and European subjects for different specific coverage levels (high/low), have been generated to assess the performance of nine different VC tools on these contrasting datasets. The performances of these tools were assessed in false positive and false negative call rates by comparing the simulated golden variants to the variants identified by each VC tool. Combining our results on sensitivity and positive predictive value (PPV), VarDict [PPV = 0.999 and Matthews correlation coefficient (MCC) = 0.832] and BCFtools (PPV = 0.999 and MCC = 0.813) perform best when using African population data on high and low coverage data. Overall, current VC tools produce high false positive and false negative rates when analysing African compared with European data. This highlights the need for development of VC approaches with high sensitivity and precision tailored for populations characterized by high genetic variations and low linkage disequilibrium.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294538PMC
http://dx.doi.org/10.1093/bib/bbaa366DOI Listing

Publication Analysis

Top Keywords

false positive
12
positive false
12
false negative
12
high
8
low high
8
variant calling
8
calling approaches
8
genetic diversity
8
tools produce
8
sensitivity precision
8

Similar Publications

Lymphadenopathy is associated with lymph node abnormal size or consistency due to many causes. We employed the deep convolutional neural network ResNet-34 to detect and classify CT images from patients with abdominal lymphadenopathy and healthy controls. We created a single database containing 1400 source CT images for patients with abdominal lymphadenopathy (n = 700) and healthy controls (n = 700).

View Article and Find Full Text PDF

Large-Scale Validation of the Feasibility of GPT-4 as a Proofreading Tool for Head CT Reports.

Radiology

January 2025

From the Departments of Biomedical Systems Informatics (S.K., Jaewoong Kim, C.H., D.Y.) and Neurology (Joonho Kim, J.Y.), Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodaemun-gu, Seoul 03722, Republic of Korea; Department of Radiology, Central Draft Physical Examination Office of Military Manpower Administration, Daegu, Republic of Korea (D.K.); Department of Radiology, Research Institute of Radiological Science and Center for Clinical Imaging Data Science (H.J.S. Y.K., S.J.), and Center for Digital Health (H.J.S., D.Y.), Yongin Severance Hospital, Yonsei University College of Medicine, Yongin, Republic of Korea; Department of Radiology, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Republic of Korea (S.H.L.); Departments of Radiology (M.H.) and Neurology (S.J.L.), Ajou University Hospital, Ajou University School of Medicine, Suwon, Republic of Korea; and Institute for Innovation in Digital Healthcare, Severance Hospital, Seoul, Republic of Korea (D.Y.).

Background The increasing workload of radiologists can lead to burnout and errors in radiology reports. Large language models, such as OpenAI's GPT-4, hold promise as error revision tools for radiology. Purpose To test the feasibility of GPT-4 use by determining its error detection, reasoning, and revision performance on head CT reports with varying error types and to validate its clinical utility by comparison with human readers.

View Article and Find Full Text PDF

Science is based on ideas that might be true or false in describing reality. In order to discern between these two, scientists conduct studies that can reveal evidence for an idea, i.e.

View Article and Find Full Text PDF

Background: Patients after transcatheter pulmonary valve implantation (TPVI) are at increased risk for infective prosthetic valve endocarditis. Diagnosis of infective endocarditis (IE) following TPVI is particularly difficult due to impaired visualization of the transcatheter pulmonary valve (TPV) with echocardiography [Delgado V, Ajmone Marsan N, de Waha S, Bonaros N, Brida M, Burri H, et al. 2023 ESC guidelines for the management of endocarditis.

View Article and Find Full Text PDF

Smart cities deploy various sensors such as microphones and RGB cameras to collect data to improve the safety and comfort of the citizens. As data annotation is expensive, self-supervised methods such as contrastive learning are used to learn audio-visual representations for downstream tasks. Focusing on surveillance data, we investigate two common limitations of audio-visual contrastive learning: false negatives and the minimal sufficient information bottleneck.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!