Publications by authors named "Yu-Yen Ou"

Adenosine triphosphate plays a vital role in providing energy and enabling key cellular processes through interactions with binding proteins. The increasing amount of protein sequence data necessitates computational methods for identifying binding sites. However, experimental identification of adenosine triphosphate-binding residues remains challenging.

View Article and Find Full Text PDF

Secondary active transporters play a crucial role in cellular physiology by facilitating the movement of molecules across cell membranes. Identifying the functional classes of these transporters, particularly amino acid and peptide transporters, is essential for understanding their involvement in various physiological processes and disease pathways, including cancer. This study aims to develop a robust computational framework that integrates pre-trained protein language models and deep learning techniques to classify amino acid and peptide transporters within the secondary active transporter (SAT) family and predict their functional association with solute carrier (SLC) proteins.

View Article and Find Full Text PDF
Article Synopsis
  • * This study introduces DeepNeoAG, a deep learning model that combines protein language models and convolutional neural networks to accurately predict neoantigens from tumor mutations.
  • * DeepNeoAG outperforms existing methods, showing promise in enhancing the development of tailored cancer treatments.
View Article and Find Full Text PDF
Article Synopsis
  • * The study introduces vesiMCNN, a new computational approach that combines pre-trained protein language models with a multi-window scanning CNN to identify vesicular transport proteins accurately.
  • * The model shows impressive results with an MCC of 0.558 and an AUC-ROC of 0.933, surpassing previous methods, and a new benchmark dataset has been created to support future research in this area.
View Article and Find Full Text PDF

Deciphering the mechanisms governing protein-DNA interactions is crucial for understanding key cellular processes and disease pathways. In this work, we present a powerful deep learning approach that significantly advances the computational prediction of DNA-interacting residues from protein sequences. Our method leverages the rich contextual representations learned by pre-trained protein language models, such as ProtTrans, to capture intrinsic biochemical properties and sequence motifs indicative of DNA binding sites.

View Article and Find Full Text PDF

Mitochondrial carriers (MCs) are essential proteins that transport metabolites across mitochondrial membranes and play a critical role in cellular metabolism. ADP/ATP (adenosine diphosphate/adenosine triphosphate) is one of the most important carriers as it contributes to cellular energy production and is susceptible to the powerful toxin bongkrekic acid. This toxin has claimed several lives; for example, a recent foodborne outbreak in Taipei, Taiwan, has caused four deaths and sickened 30 people.

View Article and Find Full Text PDF

This study delves into the prediction of protein-peptide interactions using advanced machine learning techniques, comparing models such as sequence-based, standard CNNs, and traditional classifiers. Leveraging pre-trained language models and multi-view window scanning CNNs, our approach yields significant improvements, with ProtTrans standing out based on 2.1 billion protein sequences and 393 billion amino acids.

View Article and Find Full Text PDF

Accurate classification of membrane proteins like ion channels and transporters is critical for elucidating cellular processes and drug development. We present DeepPLM_mCNN, a novel framework combining Pretrained Language Models (PLMs) and multi-window convolutional neural networks (mCNNs) for effective classification of membrane proteins into ion channels and ion transporters. Our approach extracts informative features from protein sequences by utilizing various PLMs, including TAPE, ProtT5_XL_U50, ESM-1b, ESM-2_480, and ESM-2_1280.

View Article and Find Full Text PDF
Article Synopsis
  • Secondary active transporters are crucial for ion and molecule transport in cells and are linked to diseases like cancer, but studying them through traditional biochemical methods is difficult.
  • We developed a computational method using pre-trained language models and deep learning to identify these transporters from membrane protein sequences, leveraging a dataset of 290 secondary active transporters and over 5,000 other proteins.
  • Our model, which combines ProtTrans language embeddings with a multi-window convolutional neural network, achieved high accuracy metrics (86% sensitivity, 99% specificity, 98% overall accuracy), showing that this approach surpasses traditional machine learning techniques and enhances membrane protein research.
View Article and Find Full Text PDF

Membrane proteins play a crucial role in various cellular processes and are essential components of cell membranes. Computational methods have emerged as a powerful tool for studying membrane proteins due to their complex structures and properties that make them difficult to analyze experimentally. Traditional features for protein sequence analysis based on amino acid types, composition, and pair composition have limitations in capturing higher-order sequence patterns.

View Article and Find Full Text PDF

The ability to predict 3D protein structures computationally has significantly advanced biological research. The AlphaFold protein structure database, developed by DeepMind, has provided a wealth of predicted protein structures and has the potential to bring about revolutionary changes in the field of life sciences. However, directly determining the function of proteins from their structures remains a challenging task.

View Article and Find Full Text PDF

In cellular transportation mechanisms, the movement of ions across the cell membrane and its proper control are important for cells, especially for life processes. Ion transporters/pumps and ion channel proteins work as border guards controlling the incessant traffic of ions across cell membranes. We revisited the study of classification of transporters and ion channels from membrane proteins with a more efficient deep learning approach.

View Article and Find Full Text PDF

Protein multiple sequence alignment information has long been important features to know about functions of proteins inferred from related sequences with known functions. It is therefore one of the underlying ideas of Alpha fold 2, a breakthrough study and model for the prediction of three-dimensional structures of proteins from their primary sequence. Our study used protein multiple sequence alignment information in the form of position-specific scoring matrices as input.

View Article and Find Full Text PDF

This study used k-mer embeddings as effective feature to identify DNA N6-Methyladenine sites in plant genomes and obtained improved performance without substantial effort in feature extraction, combination and selection. Identification of DNA N6-methyladenine sites has been a very active topic of computational biology due to the unavailability of suitable methods to identify them accurately, especially in plants. Substantial results were obtained with a great effort put in extracting, heuristic searching, or fusing a diverse types of features, not to mention a feature selection step.

View Article and Find Full Text PDF

Efflux proteins are the transport proteins expressed in the plasma membrane, which are involved in the movement of unwanted toxic substances through specific efflux pumps. Several studies based on computational approaches have been proposed to predict transport proteins and thereby to understand the mechanism of the movement of ions across cell membranes. However, few methods were developed to identify efflux proteins.

View Article and Find Full Text PDF

Transient receptor potential (TRP) channels are non-selective cation channels that act as ion channels and are primarily found on the plasma membrane of numerous animal cells. These channels are involved in the physiology and pathophysiology of a wide variety of biological processes, including inhibition and progression of cancer, pain initiation, inflammation, regulation of pressure, thermoregulation, secretion of salivary fluid, and homeostasis of Ca and Mg. Increasing evidences indicate that mutations in the gene encoding TRP channels play an essential role in a broad array of diseases.

View Article and Find Full Text PDF

In the past decade, convolutional neural networks (CNNs) have been used as powerful tools by scientists to solve visual data tasks. However, many efforts of convolutional neural networks in solving protein function prediction and extracting useful information from protein sequences have certain limitations. In this research, we propose a new method to improve the weaknesses of the previous method.

View Article and Find Full Text PDF

Since 2015, a fast growing number of deep learning-based methods have been proposed for protein-ligand binding site prediction and many have achieved promising performance. These methods, however, neglect the imbalanced nature of binding site prediction problems. Traditional data-based approaches for handling data imbalance employ linear interpolation of minority class samples.

View Article and Find Full Text PDF

Motivation: Primary and secondary active transport are two types of active transport that involve using energy to move the substances. Active transport mechanisms do use proteins to assist in transport and play essential roles to regulate the traffic of ions or small molecules across a cell membrane against the concentration gradient. In this study, the two main types of proteins involved in such transport are classified from transmembrane transport proteins.

View Article and Find Full Text PDF

Sirtuins are a family of proteins that play a key role in regulating a wide range of cellular processes including DNA regulation, metabolism, aging/longevity, cell survival, apoptosis, and stress resistance. Sirtuins are protein deacetylases and include in the class III family of histone deacetylase enzymes (HDACs). The class III HDACs contains seven members of the sirtuin family from SIRT1 to SIRT7.

View Article and Find Full Text PDF

It is well-known that the major reason for the rapid proliferation of cancer cells are the hypomethylation of the whole cancer genome and the hypermethylation of the promoter of particular tumor suppressor genes. Locating 5-methylcytosine (5mC) sites in promoters is therefore a crucial step in further understanding of the relationship between promoter methylation and the regulation of mRNA gene expression. High throughput identification of DNA 5mC in wet lab is still time-consuming and labor-extensive.

View Article and Find Full Text PDF

The electron transport chain is a series of protein complexes embedded in the process of cellular respiration, which is an important process to transfer electrons and other macromolecules throughout the cell. Identifying Flavin Adenine Dinucleotide (FAD) binding sites in the electron transport chain is vital since it helps biological researchers precisely understand how electrons are produced and are transported in cells. This study distills and analyzes the contextualized word embedding from pre-trained BERT models to explore similarities in natural language and protein sequences.

View Article and Find Full Text PDF

Recently, language representation models have drawn a lot of attention in the field of natural language processing (NLP) due to their remarkable results. Among them, BERT (Bidirectional Encoder Representations from Transformers) has proven to be a simple, yet powerful language model that has achieved novel state-of-the-art performance. BERT adopted the concept of contextualized word embeddings to capture the semantics and context in which words appear.

View Article and Find Full Text PDF

Recently, language representation models have drawn a lot of attention in the natural language processing field due to their remarkable results. Among them, bidirectional encoder representations from transformers (BERT) has proven to be a simple, yet powerful language model that achieved novel state-of-the-art performance. BERT adopted the concept of contextualized word embedding to capture the semantics and context of the words in which they appeared.

View Article and Find Full Text PDF

Glycosylation is a dynamic enzymatic process that attaches glycan to proteins or other organic molecules such as lipoproteins. Research has shown that such a process in ion channel proteins plays a fundamental role in modulating ion channel functions. This study used a computational method to predict N-linked glycosylation sites, the most common type, in ion channel proteins.

View Article and Find Full Text PDF