Computational methods for predicting protein subcellular localization have used various types of features, including N-terminal sorting signals, amino acid compositions, and text annotations from protein databases. Our approach does not use biological knowledge such as the sorting signals or homologues, but use just protein sequence information. The method divides a protein sequence into short $k$-mer sequence fragments which can be mapped to word features in document classification. A large number of class association rules are mined from the protein sequence examples that range from the N-terminus to the C-terminus. Then, a boosting algorithm is applied to those rules to build up a final classifier. Experimental results using benchmark datasets show our method is excellent in terms of both the classification performance and the test coverage. The result also implies that the $k$-mer sequence features which determine subcellular locations do not necessarily exist in specific positions of a protein sequence. Online prediction service implementing our method is available at http://isoft.postech.ac.kr/research/BCAR/subcell.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TCBB.2011.131 | DOI Listing |
Am J Hum Genet
January 2025
Department of Neurology, Washington University in St. Louis, St. Louis, MO, USA. Electronic address:
Dysregulation of genes encoding the homologous to E6AP C-terminus (HECT) E3 ubiquitin ligases has been linked to cancer and structural birth defects. One member of this family, the HECT-domain-containing protein 1 (HECTD1), mediates developmental pathways, including cell signaling, gene expression, and embryogenesis. Through GeneMatcher, we identified 14 unrelated individuals with 15 different variants in HECTD1 (10 missense, 3 frameshift, 1 nonsense, and 1 splicing variant) with neurodevelopmental disorders (NDDs), including autism, attention-deficit/hyperactivity disorder, and epilepsy.
View Article and Find Full Text PDFSpectrochim Acta A Mol Biomol Spectrosc
January 2025
School of Food Science and Technology, Jiangnan University, Wuxi, PR China.
This study investigates camel milk protein structural dynamics during digestion using Fourier Transform Infrared (FTIR) spectroscopy and Two-Dimensional Infrared (2D-IR) homo-correlation and hetero-correlation analysis. The synchronous 2DIR homo-correlation map reveals that NH bending and C-N stretching vibrations (amide II) are sensitive to digestion, indicating significant impacts on secondary structures. The asynchronous 2DIR homo-correlation indicates a stepwise process, where initial disruptions in NH interactions precede changes in CO stretching vibrations (amide I), highlighting the sequence of structural alterations during protein unfolding and degradation.
View Article and Find Full Text PDFJ Cheminform
January 2025
School of Systems Biomedical Science, Soongsil University, 369 Sangdo-ro, Dongjak-gu, 06978, Seoul, Republic of Korea.
G protein-coupled receptors (GPCRs) play vital roles in various physiological processes, making them attractive drug discovery targets. Meanwhile, deep learning techniques have revolutionized drug discovery by facilitating efficient tools for expediting the identification and optimization of ligands. However, existing models for the GPCRs often focus on single-target or a small subset of GPCRs or employ binary classification, constraining their applicability for high throughput virtual screening.
View Article and Find Full Text PDFMol Biotechnol
January 2025
Medical Biotechnology and Immunotherapy Research Unit, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, 7700, South Africa.
The field of gene therapy has witnessed significant advancements in the utilization of Adeno-associated virus (AAV) owing to its inherent biological advantages. Targeted AAV vectors are generated through genetic or chemical modification of the capsid for user-directed purposes. However, this process can result in imbalances in viral protein sequence homogeneity, stoichiometry, and functional transduction vector units, thereby introducing new challenges.
View Article and Find Full Text PDFCells Dev
January 2025
Departamento de Neurobiología del Desarrollo y Neurofisiología, Instituto de Neurobiología, Universidad Nacional Autónoma de México, Campus UNAM Juriquilla, Querétaro, Querétaro, Mexico. Electronic address:
fos genes, transcription factors with a common basic region and leucine zipper domains binding to a consensus DNA sequence (TGA{}TCA), are evolutionarily conserved in eukaryotes. Homologs can be found in many different species from yeast to vertebrates. In yeast, the homologous GCN4 gene is required to mediate "emergency" situations like nutrient deprivation and the unfolded protein response.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!