Publications by authors named "Yao-zhong Zhang"

Oxford Nanopore Technologies (ONT) offers ultrahigh-throughput multi-sample sequencing but only provides barcode kits that enable up to 96-sample multiplexing. We present TDFPS-Designer, a new toolkit for nanopore sequencing barcode design, which creates significantly more barcodes: 137 with a length of 20 base pairs, 410 at 24 bp, and 1779 at 30 bp, far surpassing ONT's offerings. It includes GPU-based acceleration for ultra-fast demultiplexing and designs robust barcodes suitable for high-error ONT data.

View Article and Find Full Text PDF
Article Synopsis
  • Genomic sequences are usually shown as strings of the letters A, C, G, and T, but new methods are exploring image representations like Chaos Game Representation (CGR) and read pileup images.
  • The rise of deep learning in computer vision and natural language processing has sparked interest in applying these image-based techniques to analyze genomic data.
  • The review highlights three key applications of deep learning with image processing in genome analysis, focusing on the benefits and effectiveness of these innovative approaches.
View Article and Find Full Text PDF

Summary: Functional interpretation of biological entities such as differentially expressed genes is one of the fundamental analyses in bioinformatics. The task can be addressed by using biological pathway databases with enrichment analysis (EA). However, textual description of biological entities in public databases is less explored and integrated in existing tools and it has a potential to reveal new mechanisms.

View Article and Find Full Text PDF

Single-cell RNA-sequencing (scRNA-seq) is a powerful technique that provides high-resolution expression profiling of individual cells. It significantly advances our understanding of cellular diversity and function. Despite its potential, the analysis of scRNA-seq data poses considerable challenges related to multicollinearity, data imbalance, and batch effect.

View Article and Find Full Text PDF
Article Synopsis
  • Recent advancements in transformer architecture pre-training have improved performance in various tasks, but the mechanisms behind these improvements, especially in biological data, remain unclear.
  • This study dissects the BERT model's embedding and encoding components to determine how it learns from nucleotide sequences, revealing that k-mer embeddings are effectively captured during pre-training.
  • The research shows that pre-trained k-mer embeddings can outperform traditional one-hot encoding for nucleotide representation and can enhance performance in downstream tasks when combined with simple models.
View Article and Find Full Text PDF

The human microbiome plays a crucial role in human health and is associated with a number of human diseases. Determining microbiome functional roles in human diseases remains a biological challenge due to the high dimensionality of metagenome gene features. However, existing models were limited in providing biological interpretability, where the functional role of microbes in human diseases is unexplored.

View Article and Find Full Text PDF

Accurately identifying phage-host relationships from their genome sequences is still challenging, especially for those phages and hosts with less homologous sequences. In this work, focusing on identifying the phage-host relationships at the species and genus level, we propose a contrastive learning based approach to learn whole-genome sequence embeddings that can take account of phage-host interactions (PHIs). Contrastive learning is used to make phages infecting the same hosts close to each other in the new representation space.

View Article and Find Full Text PDF

Objective: Based on ultrasound (US) images, this study aimed to detect and quantify calcifications of thyroid nodules, which are regarded as one of the most important features in US diagnosis of thyroid cancer, and to further investigate the value of US calcifications in predicting the risk of lymph node metastasis (LNM) in papillary thyroid cancer (PTC).

Methods: Based on the DeepLabv3+ networks, 2992 thyroid nodules in US images were used to train a model to detect thyroid nodules, of which 998 were used to train a model to detect and quantify calcifications. A total of 225 and 146 thyroid nodules obtained from two centers, respectively, were used to test the performance of these models.

View Article and Find Full Text PDF

Motivation: Bacteriophages/phages are the viruses that infect and replicate within bacteria and archaea, and rich in human body. To investigate the relationship between phages and microbial communities, the identification of phages from metagenome sequences is the first step. Currently, there are two main methods for identifying phages: database-based (alignment-based) methods and alignment-free methods.

View Article and Find Full Text PDF

A versatile hydrophilic and antifouling coating was designed and prepared based on catechol-modified four-arm polyethylene glycol. The dopamine (DA) molecules were grafted onto the end of the four-arm polyethylene glycol carboxyl (4A-PEG-COOH) through the amidation reaction, which was proven by H NMR and FTIR analysis, assisting the strong adhesion of PEG on the surface of various types of materials, including metallic, inorganic, and polymeric materials. The reduction of the water contact angle and the bacteria-repellent and protein-repellent effects indicated that the coating had good hydrophilicity and antifouling performance.

View Article and Find Full Text PDF

Objectives: From the viewpoint of ultrasound (US) physicians, an ideal thyroid US computer-assisted diagnostic (CAD) system for thyroid cancer should perform well in suspicious thyroid nodules with atypical risk features and be able to output explainable results. This study aims to develop an explainable US CAD model for suspicious thyroid nodules.

Methods: A total of 2992 solid or almost-solid thyroid nodules were analyzed retrospectively.

View Article and Find Full Text PDF

Read-depths (RDs) are frequently used in identifying structural variants (SVs) from sequencing data. For existing RD-based SV callers, it is difficult for them to determine breakpoints in single-nucleotide resolution due to the noisiness of RD data and the bin-based calculation. In this paper, we propose to use the deep segmentation model UNet to learn base-wise RD patterns surrounding breakpoints of known SVs.

View Article and Find Full Text PDF

Motivation: Digital pathology supports analysis of histopathological images using deep learning methods at a large-scale. However, applications of deep learning in this area have been limited by the complexities of configuration of the computational environment and of hyperparameter optimization, which hinder deployment and reduce reproducibility.

Results: Here, we propose HEAL, a deep learning-based automated framework for easy, flexible and multi-faceted histopathological image analysis.

View Article and Find Full Text PDF

Background & Aims: Fecal microbiota transplantation (FMT) is an effective therapy for recurrent Clostridioides difficile infection (rCDI). However, the overall mechanisms underlying FMT success await comprehensive elucidation, and the safety of FMT has recently become a serious concern because of the occurrence of drug-resistant bacteremia transmitted by FMT. We investigated whether functional restoration of the bacteriomes and viromes by FMT could be an indicator of successful FMT.

View Article and Find Full Text PDF

Background: Nanopore sequencing is a rapidly developing third-generation sequencing technology, which can generate long nucleotide reads of molecules within a portable device in real-time. Through detecting the change of ion currency signals during a DNA/RNA fragment's pass through a nanopore, genotypes are determined. Currently, the accuracy of nanopore basecalling has a higher error rate than the basecalling of short-read sequencing.

View Article and Find Full Text PDF

Aim: Villoglandular adenocarcinoma (VGA) of the uterine cervix is a variant of endocervical adenocarcinoma. However, the clinicopathologic and immunohistochemical features of VGA are still unclear. The aim of this study was to investigate the clinicopathologic and immunohistochemical features of VGA.

View Article and Find Full Text PDF

ASXL1 plays key roles in epigenetic regulation of gene expression through methylation of histone H3K27, and disruption of ASXL1 drives myeloid malignancies, at least in part, via derepression of posterior HOXA loci. However, little is known about the identity of proteins that interact with ASXL1 and about the functions of ASXL1 in modulation of the active histone mark, such as H3K4 methylation. In this study, we demonstrate that ASXL1 is a part of a protein complex containing HCFC1 and OGT; OGT directly stabilizes ASXL1 by O-GlcNAcylation.

View Article and Find Full Text PDF

Caveolin-1 (Cav-1), as a membrane protein involved in the formation of caveolae, binds steroid receptors and endothelial nitric oxide synthase, limiting its translocation and activation. In the present study, we investigated the role of Cav-1 in the progression of hepatic fibrosis induced by carbon tetrachloride (CCl) in murine animals. Therefore, the wild type (WT) and Cav-1-knockout (Cav-1) mice were used in our study and subjected to CCl.

View Article and Find Full Text PDF

The p53 protein is a sophisticated transcription factor that regulates dozens of target genes simultaneously in accordance with the cellular circumstances. Although considerable efforts have been made to elucidate the functions of p53-induced genes, a holistic understanding of the orchestrated signaling network repressed by p53 remains elusive. Here, we performed a systematic analysis to identify simultaneously regulated p53-repressed genes in breast cancer cells.

View Article and Find Full Text PDF

p53 encodes a transcription factor that transactivates downstream target genes involved in tumour suppression. Although osteosarcoma frequently has p53 mutations, the role of p53 in osteosarcomagenesis is not fully understood. To explore p53-target genes comprehensively in calvarial bone and find out novel druggable p53 target genes for osteosarcoma, we performed RNA sequencing using the calvarial bone and 23 other tissues from p53 and p53 mice after radiation exposure.

View Article and Find Full Text PDF

Although recent cancer genomics studies have identified a large number of genes that were mutated in human cancers, p53 remains as the most frequently mutated gene. To further elucidate the p53-signalling network, we performed transcriptome analysis on 24 tissues in p53 or p53 mice after whole-body X-ray irradiation. Here we found transactivation of a total of 3551 genes in one or more of the 24 tissues only in p53 mice, while 2576 genes were downregulated.

View Article and Find Full Text PDF

Objective: To explore the correlations of circulating tumor cells (CTCs) and disseminated tumor cells (DTCs) with the clinicopathological characteristics, prognostic events, and survival outcomes in esophageal cancer (EC) patients.

Methods: The PubMed, Web of Science, Embase database and Cochrane database were searched for studies reporting the outcomes of interest. The studies were selected according to established inclusion/exclusion criteria.

View Article and Find Full Text PDF

Background: The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures.

View Article and Find Full Text PDF