This paper describes a method to compress molecular biology databases that are characterized by an increasing proportion of data derived from genome projects. The performance of our tool has been tested on various data files of the EMBL nucleotide sequence database. The best compression ratios were achieved on EST (Expressed Sequence Tags) data, typically derived from large-scale sequence projects. The compression of sequence database updates was tested in combination with the common Unix compression program 'compress'. Our tool improved the efficiency of 'compress' on average by 16%.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/11.2.219DOI Listing

Publication Analysis

Top Keywords

sequence database
8
sequence
5
compression
4
compression mechanism
4
mechanism sequence
4
sequence databases
4
databases improve
4
improve efficiency
4
efficiency conventional
4
conventional tools
4

Similar Publications

Endometrial cancer (UCEC) is the most prevalent gynecological malignancy in high-income countries, and its incidence is rising globally. Although early-stage UCEC can be treated with surgery, advanced cases have a poor prognosis, highlighting the need for effective molecular biomarkers to improve diagnosis and prognosis. In this study, we analyzed mRNA and miRNA sequencing data from UCEC tissues and adjacent non-cancerous tissues from the TCGA database.

View Article and Find Full Text PDF

Plant viruses pose a significant threat to global agriculture and require efficient tools for their timely detection. We present AutoPVPrimer, an innovative pipeline that integrates artificial intelligence (AI) and machine learning to accelerate the development of plant virus primers. The pipeline uses Biopython to automatically retrieve different genomic sequences from the NCBI database to increase the robustness of the subsequent primer design.

View Article and Find Full Text PDF

Passion fruit (Passiflora edulis) is a commercially important crop known for its nutritional value, high antioxidant content, and use in beverages and desserts. Gulupa baciliform virus A (GBVA), tentatively named Badnavirus in the family Caulimoviridae, is a cryptic circular double-stranded DNA (dsDNA, ≈6,951 bps) virus recently reported in Colombia with asymptomatic infection of passion fruit (Sepúlveda et al. 2022).

View Article and Find Full Text PDF

The roots of Salvia yunnanensis, an herbaceous perennial widely distributed in Southwest China, is often used as a substitute for S. miltiorrhiza, a highly valued plant in traditional Chinese medicine (Wu et al. 2014).

View Article and Find Full Text PDF

An enzyme with strong single-stranded DNA (ssDNA) ligation activity would be advantageous for many molecular biology applications. However, currently available enzymes exhibit only limited activity. Here, we identified an enzyme with strong ssDNA ligation activity upon searching the databases for proteins homologous to TS2126 RNA ligase, the known enzyme with the highest yet limited ssDNA ligation activity.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!