: an R package to extract subsequences from GenBank annotations.

PeerJ

Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville, TN, USA.

Published: July 2018

Background: DNA sequences are pivotal for a wide array of research in biology. Large sequence databases, like GenBank, provide an amazing resource to utilize DNA sequences for large scale analyses. However, many sequence records on GenBank contain more than one gene or are portions of genomes. Inconsistencies in the way genes are annotated and the numerous synonyms a single gene may be listed under provide major challenges for extracting large numbers of subsequences for comparative analysis across taxa. At present, there is no easy way to extract portions from many GenBank accessions based on annotations where gene names may vary extensively.

Results: The R package allows users to extract sequences based on GenBank annotations through the ACNUC retrieval system given search terms of gene synonyms and accession numbers. extracts subsequences of interest and then writes them to a FASTA file for users to employ in their research endeavors.

Conclusion: FASTA files of extracted subsequences and accession tables generated by allow users to quickly find and extract subsequences from GenBank accessions. These sequences can then be incorporated in various analyses, like the construction of phylogenies to test a wide range of ecological and evolutionary hypotheses.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6034590PMC
http://dx.doi.org/10.7717/peerj.5179DOI Listing

Publication Analysis

Top Keywords

extract subsequences
8
subsequences genbank
8
genbank annotations
8
dna sequences
8
genbank accessions
8
genbank
6
subsequences
5
package extract
4
annotations background
4
background dna
4

Similar Publications

Addressing the issues of inadequate information exchange among subsequences in the operational time series of water injection pumps, leading to low accuracy and high false alarm rates in anomaly detection, this paper proposes a multidimensional time series anomaly detection method for water injection pump operations, leveraging Long Short-Term Memory Autoencoder augmented with Attention Mechanism (LSTMA-AE) and mechanistic constraints. The LSTMA-AE framework encompasses three primary modules: a Time Feature Extraction Module (Encoder), an Attention Layer, and a Data Reconstruction Module (Decoder). The Encoder captures temporal dependencies and features within the input sequences, mapping the input data into a higher-dimensional space.

View Article and Find Full Text PDF

Improving binding affinity prediction by emphasizing local features of drug and protein.

Comput Biol Chem

December 2024

Department of Artificial Intelligence, Korea University, Seoul, Republic of Korea. Electronic address:

Binding affinity prediction has been considered as a fundamental task in drug discovery. Despite much effort to improve accuracy of binding affinity prediction, the prior work considered only macro-level features that can represent the characteristics of the whole architecture of a drug and a target protein, and the features from local structure of the drug and the protein tend to be lost. In this paper, we propose a deep learning model that can comprehensively extract the local features of both a drug and a target protein for accurate binding affinity prediction.

View Article and Find Full Text PDF

Predicting drug target binding affinity has huge relevance in Modern drug discovery and drug repositioning processes which assist doctors to come up with new drugs or even use the existing drugs for new target proteins. In silico models, using advanced deep learning techniques could further assist these prediction tasks by providing most prominent drug target pairs. Considering these factors, a deep learning based algorithmic framework is developed in this study to support drug target interaction prediction.

View Article and Find Full Text PDF

Often, bioinformatics uses summary sketches to analyze next-generation sequencing data, but most sketches are not well understood statistically. Under a simple mutation model, Blanca et al. analyzed complete sketches, that is, the complete set of unassembled -mers, from two closely related sequences.

View Article and Find Full Text PDF

Existing deep learning methods have shown outstanding performance in predicting drug-target interactions. However, they still have limitations: (1) the over-reliance on locally extracted features by some single encoders, with insufficient consideration of global features, and (2) the inadequate modeling and learning of local crucial interaction sites in drug-target interaction pairs. In this study, we propose a novel drug-target interaction prediction model called the Neural Fingerprint and Self-Attention Mechanism (NFSA-DTI), which effectively integrates the local information of drug molecules and target sequences with their respective global features.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!