GOThresher: a program to remove annotation biases from protein function annotation datasets.

Bioinformatics

Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA.

Published: January 2023

Motivation: Advances in sequencing technologies have led to a surge in genomic data, although the functions of many gene products coded by these genes remain unknown. While in-depth, targeted experiments that determine the functions of these gene products are crucial and routinely performed, they fail to keep up with the inflow of novel genomic data. In an attempt to address this gap, high-throughput experiments are being conducted in which a large number of genes are investigated in a single study. The annotations generated as a result of these experiments are generally biased towards a small subset of less informative Gene Ontology (GO) terms. Identifying and removing biases from protein function annotation databases is important since biases impact our understanding of protein function by providing a poor picture of the annotation landscape. Additionally, as machine learning methods for predicting protein function are becoming increasingly prevalent, it is essential that they are trained on unbiased datasets. Therefore, it is not only crucial to be aware of biases, but also to judiciously remove them from annotation datasets.

Results: We introduce GOThresher, a Python tool that identifies and removes biases in function annotations from protein function annotation databases.

Availability And Implementation: GOThresher is written in Python and released via PyPI https://pypi.org/project/gothresher/ and on the Bioconda Anaconda channel https://anaconda.org/bioconda/gothresher. The source code is hosted on GitHub https://github.com/FriedbergLab/GOThresher and distributed under the GPL 3.0 license.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btad048DOI Listing

Publication Analysis

Top Keywords

protein function
20
function annotation
12
remove annotation
8
biases protein
8
genomic data
8
functions gene
8
gene products
8
annotation
6
function
6
biases
5

Similar Publications

Clinical evidence increasingly suggests that traditional treatments for dysfunctional uterine bleeding (DUB) have limited success. In this study, blood samples from 10 DUB patients and 10 healthy controls were collected for transcriptome sequencing. Then, the differentially expressed genes (DEGs) were screened and crossed with the DUB-related module genes to obtain the target genes.

View Article and Find Full Text PDF

Protocol for identifying Dicer as dsRNA binding and cleaving reagent in response to transfected dsRNA.

STAR Protoc

January 2025

CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China. Electronic address:

Mammalian Dicer has been proved to be functional on double-stranded RNAs (dsRNAs) and involved in antiviral immunity or immune regulation. Here, we present a protocol for identifying Dicer as a dsRNA binding and cleaving factor to transfected dsRNA in cell lines, based on small RNA sequencing (RNA-seq) and dsRNA-immunoprecipitation (dsRNA-IP). We detail both experimental processes and analysis on small RNA-seq data.

View Article and Find Full Text PDF

Objectives: Acute T-cell lymphoblastic leukemia (T-ALL) is a severe hematologic malignancy with limited treatment options and poor long-term survival. This study explores the role of IKZF1 in regulating BCL-2 expression in T-ALL.

Methods: CUT&Tag and CUT&Run assays were employed to assess IKZF1 binding to the BCL-2 promoter.

View Article and Find Full Text PDF

Objectives: To evaluate the efficacy and safety of first-line targeted synthetic disease-modifying anti-rheumatic drugs (tsDMARDs) in patients with rheumatoid arthritis (RA) and chronic kidney disease (CKD).

Methods: This retrospective cohort study included 216 patients with RA prescribed their first tsDMARDs at two hospitals between 2013 and 2022. Dose reduction and contraindication guidelines for tsDMARDs according to kidney function were followed.

View Article and Find Full Text PDF

MetAssimulo 2.0: a web app for simulating realistic 1D & 2D Metabolomic 1H NMR spectra.

Bioinformatics

January 2025

Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine, Imperial College London, London, W12 0NN, United Kingdom.

Unlabelled: Metabolomics extensively utilizes Nuclear Magnetic Resonance (NMR) spectroscopy due to its excellent reproducibility and high throughput. Both one-dimensional (1D) and two-dimensional (2D) NMR spectra provide crucial information for metabolite annotation and quantification, yet present complex overlapping patterns which may require sophisticated machine learning algorithms to decipher. Unfortunately, the limited availability of labeled spectra can hamper application of machine learning, especially deep learning algorithms which require large amounts of labelled data.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!