The genome-wide association study (GWAS) aims to detect associations between individual single nucleotide polymorphisms (SNPs) or SNP interactions and phenotypes to decipher the genetic mechanism. Existing GWAS analysis tools have different focuses and advantages, but suffer a series of tedious and heterogeneous configurations for computation. It is inconvenient for researchers to simply choose and apply these tools, statistically and biologically analyze their results for different usages.
View Article and Find Full Text PDFNucleic Acids Res
January 2025
Identifying cell populations associated with risk variants is essential for uncovering cell-specific mechanisms that drive disease development and progression. Integrating genome-wide association studies (GWAS) with single-cell RNA sequencing (scRNA-seq) has become an effective strategy for detecting trait-cell relationships. The accumulation of trait-related single cell data has led to an urgent need for its comprehensively processing.
View Article and Find Full Text PDFElectronic Health Records (EHRs) contain various valuable medical entities and their relationships. Although the extraction of biomedical relationships has achieved good results in the mining of electronic health records and the construction of biomedical knowledge bases, there are still some problems. There may be implied complex associations between entities and relationships in overlapping triplets, and ignoring these interactions may lead to a decrease in the accuracy of entity extraction.
View Article and Find Full Text PDFBMC Bioinformatics
May 2024
IEEE Trans Neural Netw Learn Syst
April 2024
Contrastive learning (CL) has emerged as a powerful approach for self-supervised learning. However, it suffers from sampling bias, which hinders its performance. While the mainstream solutions, hard negative mining (HNM) and supervised CL (SCL), have been proposed to mitigate this critical issue, they do not effectively address graph CL (GCL).
View Article and Find Full Text PDFPredicting the interaction affinity between drugs and target proteins is crucial for rapid and accurate drug discovery and repositioning. Therefore, more accurate prediction of DTA has become a key area of research in the field of drug discovery and drug repositioning. However, traditional experimental methods have disadvantages such as long operation cycles, high manpower requirements, and high economic costs, making it difficult to predict specific interactions between drugs and target proteins quickly and accurately.
View Article and Find Full Text PDFNucleic Acids Res
February 2024
Functional molecular module (i.e., gene-miRNA co-modules and gene-miRNA-lncRNA triple-layer modules) analysis can dissect complex regulations underlying etiology or phenotypes.
View Article and Find Full Text PDFTranscription factors (TFs), transcription co-factors (TcoFs) and their target genes perform essential functions in diseases and biological processes. KnockTF 2.0 (http://www.
View Article and Find Full Text PDFComput Biol Med
November 2023
Colorectal cancer (CRC) holds the distinction of being the most prevalent malignant tumor affecting the digestive system. It is a formidable global health challenge, as it ranks as the fourth leading cause of cancer-related fatalities around the world. Despite considerable advancements in comprehending and addressing colorectal cancer (CRC), the likelihood of recurring tumors and metastasis remains a major cause of high morbidity and mortality rates during treatment.
View Article and Find Full Text PDFAccurate equipment operation trend prediction plays an important role in ensuring the safe operation of equipment and reducing maintenance costs. Therefore, monitoring the equipment vibration and predicting the time series of the vibration trend is one of the effective means to prevent equipment failures. In order to reduce the error of equipment operation trend prediction, this paper proposes a method for equipment operation trend prediction based on a combination of signal decomposition and an Informer prediction model.
View Article and Find Full Text PDFDetermining cell types by single-cell transcriptomics data is fundamental for downstream analysis. However, cell clustering and data imputation still face the computation challenges, due to the high dropout rate, sparsity and dimensionality of single-cell data. Although some deep learning based solutions have been proposed to handle these challenges, they still can not leverage gene attribute information and cell topology in a sensible way to explore the consistent clustering.
View Article and Find Full Text PDFBMC Bioinformatics
April 2023
Motivation: Gene regulatory networks (GRNs) arise from the intricate interactions between transcription factors (TFs) and their target genes during the growth and development of organisms. The inference of GRNs can unveil the underlying gene interactions in living systems and facilitate the investigation of the relationship between gene expression patterns and phenotypic traits. Although several machine-learning models have been proposed for inferring GRNs from single-cell RNA sequencing (scRNA-seq) data, some of these models, such as Boolean and tree-based networks, suffer from sensitivity to noise and may encounter difficulties in handling the high noise and dimensionality of actual scRNA-seq data, as well as the sparse nature of gene regulation relationships.
View Article and Find Full Text PDFFinding the causal structure from a set of variables given observational data is a crucial task in many scientific areas. Most algorithms focus on discovering the global causal graph but few efforts have been made toward the local causal structure (LCS), which is of wide practical significance and easier to obtain. LCS learning faces the challenges of neighborhood determination and edge orientation.
View Article and Find Full Text PDFCooperative driver pathways discovery helps researchers to study the pathogenesis of cancer. However, most discovery methods mainly focus on genomics data, and neglect the known pathway information and other related multi-omics data; thus they cannot faithfully decipher the carcinogenic process. We propose CDPMiner (Cooperative Driver Pathways Miner) to discover cooperative driver pathways by multiplex network embedding, which can jointly model relational and attribute information of multi-type molecules.
View Article and Find Full Text PDFMotivation: Predicting the associations between human microbes and drugs (MDAs) is one critical step in drug development and precision medicine areas. Since discovering these associations through wet experiments is time-consuming and labor-intensive, computational methods have already been an effective way to tackle this problem. Recently, graph contrastive learning (GCL) approaches have shown great advantages in learning the embeddings of nodes from heterogeneous biological graphs (HBGs).
View Article and Find Full Text PDFMotivation: CircularRNA (circRNA) is a class of noncoding RNA with high conservation and stability, which is considered as an important disease biomarker and drug target. Accumulating pieces of evidence have indicated that circRNA plays a crucial role in the pathogenesis and progression of many complex diseases. As the biological experiments are time-consuming and labor-intensive, developing an accurate computational prediction method has become indispensable to identify disease-related circRNAs.
View Article and Find Full Text PDFMotivation: High-resolution annotation of gene functions is a central task in functional genomics. Multiple proteoforms translated from alternatively spliced isoforms from a single gene are actual function performers and greatly increase the functional diversity. The specific functions of different isoforms can decipher the molecular basis of various complex diseases at a finer granularity.
View Article and Find Full Text PDFPersonalized federated learning (PFL) learns a personalized model for each client in a decentralized manner, where each client owns private data that are not shared and data among clients are non-independent and identically distributed (i.i.d.
View Article and Find Full Text PDFDifferent cancer types not only have common characteristics but also have their own characteristics respectively. The mechanism of these specific and common characteristics is still unclear. Pan-cancer analysis can help understand the similarities and differences among cancer types by systematically describing different patterns in cancers and identifying cancer-specific and cancer-common molecular biomarkers.
View Article and Find Full Text PDFWith the development of high-throughput genotyping technology, single nucleotide polymorphism (SNP)-SNP interactions (SSIs) detection has become an essential way for understanding disease susceptibility. Various methods have been proposed to detect SSIs. However, given the disease complexity and bias of individual SSI detectors, these single-detector-based methods are generally unscalable for real genome-wide data and with unfavorable results.
View Article and Find Full Text PDFGenome-phenome association (GPA) prediction can promote the understanding of biological mechanisms about complex pathology of phenotypes (i.e., traits and diseases).
View Article and Find Full Text PDFPredicting differentially expressed genes (DEGs) from epigenetics signal data is the key to understand how epigenetics controls cell functional heterogeneity by gene regulation. This knowledge can help developing 'epigenetics drugs' for complex diseases like cancers. Most of existing machine learning-based methods suffer defects in prediction accuracy, interpretability or training speed.
View Article and Find Full Text PDFDNA N6-Methyladenine (6mA) is a common epigenetic modification, which plays some significant roles in the growth and development of plants. It is crucial to identify 6mA sites for elucidating the functions of 6mA. In this article, a novel model named i6mA-vote is developed to predict 6mA sites of plants.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
April 2023
Predicting differential gene expression (DGE) from Histone modifications (HM) signal is crucial to understand how HM controls cell functional heterogeneity through influencing differential gene regulation. Most existing prediction methods use fixed-length bins to represent HM signals and transmit these bins into a single machine learning model to predict differential expression genes of single cell type or cell type pair. However, the inappropriate bin length may cause the splitting of the important HM segment and lead to information loss.
View Article and Find Full Text PDF