GOThresher: a program to remove annotation biases from protein function annotation datasets.

Parnal Joshi Sagnik Banerjee Xiao Hu Pranav M Khade Iddo Friedberg

Bioinformatics

Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA.

Published: January 2023

Motivation: Advances in sequencing technologies have led to a surge in genomic data, although the functions of many gene products coded by these genes remain unknown. While in-depth, targeted experiments that determine the functions of these gene products are crucial and routinely performed, they fail to keep up with the inflow of novel genomic data. In an attempt to address this gap, high-throughput experiments are being conducted in which a large number of genes are investigated in a single study. The annotations generated as a result of these experiments are generally biased towards a small subset of less informative Gene Ontology (GO) terms. Identifying and removing biases from protein function annotation databases is important since biases impact our understanding of protein function by providing a poor picture of the annotation landscape. Additionally, as machine learning methods for predicting protein function are becoming increasingly prevalent, it is essential that they are trained on unbiased datasets. Therefore, it is not only crucial to be aware of biases, but also to judiciously remove them from annotation datasets.

Results: We introduce GOThresher, a Python tool that identifies and removes biases in function annotations from protein function annotation databases.

Availability And Implementation: GOThresher is written in Python and released via PyPI https://pypi.org/project/gothresher/ and on the Bioconda Anaconda channel https://anaconda.org/bioconda/gothresher. The source code is hosted on GitHub https://github.com/FriedbergLab/GOThresher and distributed under the GPL 3.0 license.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF	Source
http://dx.doi.org/10.1093/bioinformatics/btad048	DOI Listing

Publication Analysis

Top Keywords

protein function

function annotation

remove annotation

biases protein

genomic data

functions gene

gene products

annotation

function

biases

Similar Publications

Analysis and identification of potential biomarkers for dysfunctional uterine bleeding.

J Reprod Immunol

January 2025

Department of Chinese Medicine Rehabilitation, The First Affiliated Hospital of Guizhou University of Traditional Chinese Medicine, Guiyang 50001, China. Electronic address:

N Zhang Y Liang Y Q Meng Y C Li X Lu

Clinical evidence increasingly suggests that traditional treatments for dysfunctional uterine bleeding (DUB) have limited success. In this study, blood samples from 10 DUB patients and 10 healthy controls were collected for transcriptome sequencing. Then, the differentially expressed genes (DEGs) were screened and crossed with the DUB-related module genes to obtain the target genes.

View Article and Find Full Text PDF

Similar Publications

Protocol for identifying Dicer as dsRNA binding and cleaving reagent in response to transfected dsRNA.

STAR Protoc

January 2025

CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China. Electronic address:

Yunpeng Dai Jiaxin Wang Jiaqi Zhang Xing Liu Gang Sun

Mammalian Dicer has been proved to be functional on double-stranded RNAs (dsRNAs) and involved in antiviral immunity or immune regulation. Here, we present a protocol for identifying Dicer as a dsRNA binding and cleaving factor to transfected dsRNA in cell lines, based on small RNA sequencing (RNA-seq) and dsRNA-immunoprecipitation (dsRNA-IP). We detail both experimental processes and analysis on small RNA-seq data.

View Article and Find Full Text PDF

Similar Publications

Targeting the IKZF1/BCL-2 axis as a novel therapeutic strategy for treating acute T-cell lymphoblastic leukemia.

Cancer Biol Ther

December 2025

Department of Hematology, Taixing People's Hospital Affiliated to Yangzhou University, Taixing, China.

Juan Li Chunmei Ye Hui Li Jun Li

Objectives: Acute T-cell lymphoblastic leukemia (T-ALL) is a severe hematologic malignancy with limited treatment options and poor long-term survival. This study explores the role of IKZF1 in regulating BCL-2 expression in T-ALL.

Methods: CUT&Tag and CUT&Run assays were employed to assess IKZF1 binding to the BCL-2 promoter.

View Article and Find Full Text PDF

Similar Publications

Efficacy and safety of first-line targeted synthetic DMARDs in rheumatoid arthritis patients with chronic kidney disease.

Rheumatology (Oxford)

January 2025

Nephrology Center and Department of Rheumatology, Toranomon Hospital, Tokyo, Japan.

Yusuke Yoshimura Masayuki Yamanouchi Ryo Koizumi Hiroki Mizuno Yuki Oba

Objectives: To evaluate the efficacy and safety of first-line targeted synthetic disease-modifying anti-rheumatic drugs (tsDMARDs) in patients with rheumatoid arthritis (RA) and chronic kidney disease (CKD).

Methods: This retrospective cohort study included 216 patients with RA prescribed their first tsDMARDs at two hospitals between 2013 and 2022. Dose reduction and contraindication guidelines for tsDMARDs according to kidney function were followed.

View Article and Find Full Text PDF

Similar Publications

MetAssimulo 2.0: a web app for simulating realistic 1D & 2D Metabolomic 1H NMR spectra.

Bioinformatics

January 2025

Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine, Imperial College London, London, W12 0NN, United Kingdom.

Yan Yan Beatriz Jiménez Michael T Judge Toby Athersuch Maria De Iorio

Unlabelled: Metabolomics extensively utilizes Nuclear Magnetic Resonance (NMR) spectroscopy due to its excellent reproducibility and high throughput. Both one-dimensional (1D) and two-dimensional (2D) NMR spectra provide crucial information for metabolite annotation and quantification, yet present complex overlapping patterns which may require sophisticated machine learning algorithms to decipher. Unfortunately, the limited availability of labeled spectra can hamper application of machine learning, especially deep learning algorithms which require large amounts of labelled data.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!