AI Article Synopsis

  • The development of DNASimCLR addresses the challenges of extracting features from complex microbial sequence data using an unsupervised approach rather than relying on limited labeled datasets.
  • By employing convolutional neural networks and the SimCLR contrastive learning framework, DNASimCLR has shown to effectively extract features from unannotated gene sequences, demonstrating equal or better performance than existing methods.
  • As a flexible and database-independent tool, DNASimCLR is particularly advantageous for classifying novel gene sequences, making it significant for various applications in the field of genomics.

Article Abstract

Background: The rapid advancements in deep neural network models have significantly enhanced the ability to extract features from microbial sequence data, which is critical for addressing biological challenges. However, the scarcity and complexity of labeled microbial data pose substantial difficulties for supervised learning approaches. To address these issues, we propose DNASimCLR, an unsupervised framework designed for efficient gene sequence data feature extraction.

Results: DNASimCLR leverages convolutional neural networks and the SimCLR framework, based on contrastive learning, to extract intricate features from diverse microbial gene sequences. Pre-training was conducted on two classic large scale unlabelled datasets encompassing metagenomes and viral gene sequences. Subsequent classification tasks were performed by fine-tuning the pretrained model using the previously acquired model. Our experiments demonstrate that DNASimCLR is at least comparable to state-of-the-art techniques for gene sequence classification. For convolutional neural network-based approaches, DNASimCLR surpasses the latest existing methods, clearly establishing its superiority over the state-of-the-art CNN-based feature extraction techniques. Furthermore, the model exhibits superior performance across diverse tasks in analyzing biological sequence data, showcasing its robust adaptability.

Conclusions: DNASimCLR represents a robust and database-agnostic solution for gene sequence classification. Its versatility allows it to perform well in scenarios involving novel or previously unseen gene sequences, making it a valuable tool for diverse applications in genomics.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11476100PMC
http://dx.doi.org/10.1186/s12859-024-05955-8DOI Listing

Publication Analysis

Top Keywords

gene sequence
16
sequence data
16
gene sequences
12
convolutional neural
8
sequence classification
8
gene
7
dnasimclr
6
sequence
6
data
5
dnasimclr contrastive
4

Similar Publications

The NAC transcription factor LpNAC48 promotes trichome formation in Lilium pumilum.

Plant Physiol

January 2025

Beijing Key Laboratory of Development and Quality Control of Ornamental Crops, Department of Ornamental Horticulture, China Agricultural University, Beijing 100193, China.

Trichomes play a crucial role in plant resistance to abiotic and biotic stresses, and their development and characteristics vary across different species. This study demonstrates that trichomes of Lilium pumilum exhibit synchronized growth during flower bud differentiation and enhance the plant's adaptability to UV-B radiation and aphid infection. We identified LpNAC48, a NAC family transcription factor (TF), that interacted with the B-box (BBX) family TF LpBBX28, during trichome formation in L.

View Article and Find Full Text PDF

A Sorghum / Homolog Functions in PAMP-Triggered Immunity and Cell Death in Response to Infection.

Phytopathology

January 2025

University of Florida, Microbiology & Cell Science, Cancer/Genetics Research Complex 302, 2033 Mowry Road, Gainesville, Florida, United States, 32610;

(L.) Moench is the fifth most important cereal crop and expected to gain prominence due to its versatility, low input requirements, and tolerance to hot and dry conditions. In warm and humid environments the productivity of sorghum is severely limited by the hemibiotrophic fungal pathogen , the causal agent of anthracnose.

View Article and Find Full Text PDF

Purpose: Circulating tumor DNA (ctDNA) analysis is an alternative to tissue biopsy for genotyping in various cancers. We aimed to establish a plasma ctDNA sequencing assay, then evaluate its clinical utility in advanced urothelial cancer (UC).

Materials And Methods: This study included 82 patients with muscle-invasive or metastatic UC.

View Article and Find Full Text PDF

Background: Clonal hematopoiesis of indeterminate potential (CHIP) is the presence of somatic mutations in myeloid and lymphoid malignancy genes in the blood cells of individuals without a hematologic malignancy. Inflammation is hypothesized to be a key mediator in the progression of CHIP to hematologic malignancy and patients with CHIP have a high prevalence of inflammatory diseases. This study aimed to identify the prevalence and characteristics of CHIP in patients with inflammatory bowel disease (IBD).

View Article and Find Full Text PDF

Various mature tissue-resident cells exhibit progenitor characteristics following injury. However, the existence of endogenous stem cells with multiple lineage potentials in the adult spinal cord remains a compelling area of research. In this study, we present a cross-species investigation that extends from development to injury.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!