Background: Long short-term memory (LSTM) is one of the most attractive deep learning methods to learn time series or contexts of input data. Increasing studies, including biological sequence analyses in bioinformatics, utilize this architecture. Amino acid sequence profiles are widely used for bioinformatics studies, such as sequence similarity searches, multiple alignments, and evolutionary analyses. Currently, many biological sequences are becoming available, and the rapidly increasing amount of sequence data emphasizes the importance of scalable generators of amino acid sequence profiles.

Results: We employed the LSTM network and developed a novel profile generator to construct profiles without any assumptions, except for input sequence context. Our method could generate better profiles than existing de novo profile generators, including CSBuild and RPS-BLAST, on the basis of profile-sequence similarity search performance with linear calculation costs against input sequence size. In addition, we analyzed the effects of the memory power of LSTM and found that LSTM had high potential power to detect long-range interactions between amino acids, as in the case of beta-strand formation, which has been a difficult problem in protein bioinformatics using sequence information.

Conclusion: We demonstrated the importance of sequence context and the feasibility of LSTM on biological sequence analyses. Our results demonstrated the effectiveness of memories in LSTM and showed that our de novo profile generator, SPBuild, achieved higher performance than that of existing methods for profile prediction of beta-strands, where long-range interactions of amino acids are important and are known to be difficult for the existing window-based prediction methods. Our findings will be useful for the development of other prediction methods related to biological sequences by machine learning methods.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6052547PMC
http://dx.doi.org/10.1186/s12859-018-2284-1DOI Listing

Publication Analysis

Top Keywords

novo profile
12
sequence context
12
sequence
11
long short-term
8
short-term memory
8
learning methods
8
biological sequence
8
sequence analyses
8
amino acid
8
acid sequence
8

Similar Publications

Introduction: Metabolic and bariatric surgery (MBS) is increasingly used for obesity and metabolic disease, with safety profiles showing it is among the safest major operations. The last 20 + years have noted significantly improved safety that has been accompanied by decreasing length of stay and select populations electing for outpatient surgery, leading to continued decreases in cost. Regardless, readmissions and complications still occur, requiring inpatient postoperative care (IP-POC).

View Article and Find Full Text PDF

Multi-omics sequencing of gastroesophageal junction adenocarcinoma reveals prognosis-relevant key factors and a novel immunogenomic classification.

Gastric Cancer

January 2025

Department of Biochemistry and Molecular Biology, Key Laboratory of Breast Cancer Prevention and Therapy, Ministry of Education, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China.

Background: Gastroesophageal junction adenocarcinoma (GEJAC) exhibits distinct molecular characteristics due to its unique anatomical location. We sought to investigate effective and reliable molecular classification of GEJAC to guide personalized treatment.

Methods: We analyzed the whole genomic, transcriptomic, T-cell receptor repertoires, and immunohistochemical data in 92 GEJAC patients and delineated the landscape of genetic and immune alterations.

View Article and Find Full Text PDF

Background: There is a paucity of real-world data on patients with interstitial lung diseases (ILDs) that are progressive, other than idiopathic pulmonary fibrosis (IPF), including treatment patterns and attitudes toward treatment. This study aimed to investigate the diagnosis, clinical characteristics, treatment paradigm and current decision-making practices of IPF and progressive pulmonary fibrosis (PPF) in a Japanese real-world setting.

Methods: Data were drawn from the Adelphi Real World PPF-ILD Disease Specific Programme™, a cross-sectional survey with retrospective data collection of pulmonologists and rheumatologists in Japan from April to October 2022.

View Article and Find Full Text PDF

Objective: Patients who undergo major lower extremity amputation (MLEA) have the highest postoperative mortality among orthopedic patient groups. The comorbidity profile for MLEA patients is often extensive and associated with elevated postoperative mortality. This study primarily aimed to investigate the increased short- and long-term mortality following first and subsequent major lower extremity amputation.

View Article and Find Full Text PDF

Background: WGS can potentially be routinely used in clinical microbiology settings, especially with the increase in sequencing accuracy and decrease in cost. Escherichia coli is the most common bacterial species analysed in those settings, thus fast and accurate diagnostics can lead to reductions in morbidity, mortality and healthcare costs.

Objectives: To evaluate WGS for diagnostics and surveillance in a collection of clinical E.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!