Publications by authors named "ShuangSang Fang"

Three-dimensional Spatial Transcriptomics has revolutionized our understanding of tissue regionalization, organogenesis, and development. However, existing approaches overlook either spatial information or experiment-induced distortions, leading to significant discrepancies between reconstruction results and in vivo cell locations, causing unreliable downstream analysis. To address these challenges, we propose ST-GEARS (Spatial Transcriptomics GEospatial profile recovery system through AnchoRS).

View Article and Find Full Text PDF

Background: Integrative analysis of spatially resolved transcriptomics datasets empowers a deeper understanding of complex biological systems. However, integrating multiple tissue sections presents challenges for batch effect removal, particularly when the sections are measured by various technologies or collected at different times.

Findings: We propose spatiAlign, an unsupervised contrastive learning model that employs the expression of all measured genes and the spatial location of cells, to integrate multiple tissue sections.

View Article and Find Full Text PDF

This paper introduces a new approach to cell clustering using the Variable Neighborhood Search (VNS) metaheuristic. The purpose of this method is to cluster cells based on both gene expression and spatial coordinates. Initially, we confronted this clustering challenge as an Integer Linear Programming minimization problem.

View Article and Find Full Text PDF
Article Synopsis
  • Stereo-seq enhances spatially resolved transcriptomics by analyzing large tissues at the single-cell level, achieving high resolution on both cellular and subcellular scales.
  • The software STCellbin improves upon previous versions by integrating cell membrane/wall staining images for enhanced accuracy in capturing single-cell spatial gene expression profiles.
  • STCellbin has been validated with mouse liver and seed datasets, proving to be more reliable than earlier methods and aiding in a better understanding of tissue biology through single-cell analysis.
View Article and Find Full Text PDF

Unlabelled: As genomic sequencing technology continues to advance, it becomes increasingly important to perform joint analyses of multiple datasets of transcriptomics. However, batch effect presents challenges for dataset integration, such as sequencing data measured on different platforms, and datasets collected at different times. Here, we report the development of BatchEval Pipeline, a batch effect workflow used to evaluate batch effect on dataset integration.

View Article and Find Full Text PDF

The basic analysis steps of spatial transcriptomics require obtaining gene expression information from both space and cells. The existing tools for these analyses incur performance issues when dealing with large datasets. These issues involve computationally intensive spatial localization, RNA genome alignment, and excessive memory usage in large chip scenarios.

View Article and Find Full Text PDF

Background: The emergence of high-resolved spatial transcriptomics (ST) has facilitated the research of novel methods to investigate biological development, organism growth, and other complex biological processes. However, high-resolved and whole transcriptomics ST datasets require customized imputation methods to improve the signal-to-noise ratio and the data quality.

Findings: We propose an efficient and adaptive Gaussian smoothing (EAGS) imputation method for high-resolved ST.

View Article and Find Full Text PDF

Background: Cell clustering is a pivotal aspect of spatial transcriptomics (ST) data analysis as it forms the foundation for subsequent data mining. Recent advances in spatial domain identification have leveraged graph neural network (GNN) approaches in conjunction with spatial transcriptomics data. However, such GNN-based methods suffer from representation collapse, wherein all spatial spots are projected onto a singular representation.

View Article and Find Full Text PDF

The development of spatial transcriptomics (ST) technologies has transformed genetic research from a single-cell data level to a two-dimensional spatial coordinate system and facilitated the study of the composition and function of various cell subsets in different environments and organs. The large-scale data generated by these ST technologies, which contain spatial gene expression information, have elicited the need for spatially resolved approaches to meet the requirements of computational and biological data interpretation. These requirements include dealing with the explosive growth of data to determine the cell-level and gene-level expression, correcting the inner batch effect and loss of expression to improve the data quality, conducting efficient interpretation and in-depth knowledge mining both at the single-cell and tissue-wide levels, and conducting multi-omics integration analysis to provide an extensible framework toward the in-depth understanding of biological processes.

View Article and Find Full Text PDF
Article Synopsis
  • Transcription co-factors (TcoFs) are vital for gene expression regulation, linking enhancers to promoters.
  • The TcoFBase database was created to compile extensive data on TcoFs, including 2322 TcoFs and 6759 ChIP-seq datasets from over 500 human and mouse tissues/cell types.
  • TcoFBase offers various analyses, such as enrichment and regulatory network analysis, to help users understand TcoFs' functions and their roles in gene regulation.
View Article and Find Full Text PDF

Gene set enrichment (GSE) analysis plays an essential role in extracting biological insight from genome-scale experiments. ORA (overrepresentation analysis), FCS (functional class scoring), and PT (pathway topology) approaches are three generations of GSE methods along the timeline of development. Previous versions of KOBAS provided services based on just the ORA method.

View Article and Find Full Text PDF

Molecular characteristics can be good indicators of tumor prognosis and have been introduced into the classification of gliomas. The prognosis of patients with newly classified lower-grade gliomas (LGGs, including grade 2 and grade 3 gliomas) is highly heterogeneous, and new molecular markers are urgently needed. Autophagy related genes (ATGs) were obtained from Human Autophagy Database (HADb).

View Article and Find Full Text PDF

Next-generation sequencing is increasingly being adopted as a valuable method for the detection of somatic variants in clinical oncology. However, it is still challenging to reach a satisfactory level of robustness and standardization in clinical practice when using the currently available bioinformatics pipelines to detect variants from raw sequencing data. Moreover, appropriate reference data sets are lacking for clinical bioinformatics pipeline development, validation, and proficiency testing.

View Article and Find Full Text PDF

Pharmacotranscriptomics has become a powerful approach for evaluating the therapeutic efficacy of drugs and discovering new drug targets. Recently, studies of traditional Chinese medicine (TCM) have increasingly turned to high-throughput transcriptomic screens for molecular effects of herbs/ingredients. And numerous studies have examined gene targets for herbs/ingredients, and link herbs/ingredients to various modern diseases.

View Article and Find Full Text PDF

NONCODE (http://www.noncode.org/) is a comprehensive database of collection and annotation of noncoding RNAs, especially long non-coding RNAs (lncRNAs) in animals.

View Article and Find Full Text PDF

Studies have shown that microRNAs (miRNAs) play a vital role in tumor progression and patients' prognosis. Therefore, we aimed to construct a miRNA model for forecasting the survival of hepatocellular carcinoma (HCC) patients. The gene expression data of 433 patients with HCC from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus public databases were remined by survival analysis and receptor manipulation characteristic curve (ROC).

View Article and Find Full Text PDF

As more and more high-throughput data has been produced by next-generation sequencing, it is still a challenge to classify RNA transcripts into protein-coding or non-coding, especially for poorly annotated species. We upgraded our original coding potential calculator, CNCI (Coding-Non-Coding Index), to CNIT (Coding-Non-Coding Identifying Tool), which provides faster and more accurate evaluation of the coding ability of RNA transcripts. CNIT runs ∼200 times faster than CNCI and exhibits more accuracy compared with CNCI (0.

View Article and Find Full Text PDF
Article Synopsis
  • The pharmaceutical industry has recently focused on phenotypic drug discovery (PDD), which is based on observing changes in disease symptoms or phenotypes, particularly highlighted by the potential of traditional Chinese medicine (TCM).
  • A new database called SymMap has been developed to connect TCM symptoms with herbs and modern medical conditions, relying on the insights of a committee of 17 TCM experts, and includes extensive data on diseases, herbal ingredients, and target genes.
  • By creating a comprehensive network from this curated information, SymMap enables researchers to better rank and evaluate potential drug candidates, aiding in the drug discovery process, and is available for access online.
View Article and Find Full Text PDF

RNA editing is a post-transcriptional event that leads to transcriptome diversity and has been shown to play important roles in tumorigenesis. However, dynamical changes and the functional significance of editing events during different cancer stages have not yet been characterized systematically. In this paper, we describe a comprehensive study of the RNA editome of four samples from different cancer stages for the same patient based on analysis of both whole-genome and transcriptome sequencing data.

View Article and Find Full Text PDF

NONCODE (http://www.bioinfo.org/noncode/) is a systematic database that is dedicated to presenting the most complete collection and annotation of non-coding RNAs (ncRNAs), especially long non-coding RNAs (lncRNAs).

View Article and Find Full Text PDF

NONCODE is a comprehensive database that aims to present the most complete collection and annotation of non-coding RNAs, especially long non-coding RNAs (lncRNA genes), and thus NONCODE is essential to modern biological and medical research. Scientists are producing a flood of new data from which new lncRNA genes and lncRNA-disease relationships are continually being identified. NONCODE assimilates such information from a wide variety of sources including published articles, RNA-seq data, micro-array data and databases on genetic variation (dbSNP) and genome-wide associations (GWAS).

View Article and Find Full Text PDF

Long non-coding RNAs are known to be involved in cancer progression, but their biological functions and prognostic values are still largely unexplored in diffuse large B-cell lymphoma. In this study, long non-coding RNAs expression was characterized in 1,403 samples including normal and diffuse large B-cell lymphoma by repurposing 7 microarray datasets. Compared with any stage of normal B cells, NONHSAG026900 expression was significantly decreased in tumor samples.

View Article and Find Full Text PDF

RNA-seq technology offers the promise of rapid comprehensive discovery of long intervening noncoding RNAs (lincRNAs). Basic tools such as Tophat and Cufflinks have been widely used for RNA-seq assembly. However, advanced bioinformatics methodologies that allow in-depth analysis of lincRNAs are lacking.

View Article and Find Full Text PDF

NONCODE (http://www.bioinfo.org/noncode/) is an interactive database that aims to present the most complete collection and annotation of non-coding RNAs, especially long non-coding RNAs (lncRNAs).

View Article and Find Full Text PDF
Article Synopsis
  • Mammalian genomes contain a large number of long non-coding RNAs (lncRNAs), which are important for various biological functions; however, many species lack detailed lncRNA transcript information.
  • Using RNA sequencing from multiple tissues across nine species, researchers created extensive catalogs of lncRNAs, discovering notable differences in lncRNA expression and conservation compared to protein-coding genes.
  • The study includes a database called PhyloNONCODE to help scientists explore lncRNA evolution and expression, providing a valuable resource for understanding these non-coding genetic elements.
View Article and Find Full Text PDF