Background: Features of a DNA sequence can be found by compressing the sequence under a suitable model; good compression implies low information content. Good DNA compression models consider repetition, differences between repeats, and base distributions. From a linear DNA sequence, a compression model can produce a linear information sequence. Linear space complexity is important when exploring long DNA sequences of the order of millions of bases. Compressing a sequence in isolation will include information on self-repetition. Whereas compressing a sequence Y in the context of another X can find what new information X gives about Y. This paper presents a methodology for performing comparative analysis to find features exposed by such models.

Results: We apply such a model to find features across chromosomes of Cyanidioschyzon merolae. We present a tool that provides useful linear transformations to investigate and save new sequences. Various examples illustrate the methodology, finding features for sequences alone and in different contexts. We also show how to highlight all sets of self-repetition features, in this case within Plasmodium falciparum chromosome 2.

Conclusion: The methodology finds features that are significant and that biologists confirm. The exploration of long information sequences in linear time and space is fast and the saved results are self documenting.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1892068PMC
http://dx.doi.org/10.1186/1471-2105-8-S2-S10DOI Listing

Publication Analysis

Top Keywords

compressing sequence
12
comparative analysis
8
long dna
8
dna sequences
8
dna sequence
8
find features
8
features
6
sequence
6
dna
5
sequences
5

Similar Publications

Deep learning sequence models trained on personalized genomics can improve variant effect prediction, however, applications of these models are limited by computational requirements for storing and reading large datasets. We address this with GenVarLoader, which stores personalized genomic data in new memory-mapped formats with optimal data locality to achieve ∼1,000x faster throughput and ∼2,000x better compression compared to existing alternatives.

View Article and Find Full Text PDF

Major advances in protein function assignment by remote homolog detection with protein language models - A review.

Curr Opin Struct Biol

January 2025

Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA; Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA. Electronic address:

There is an ever-increasing need for accurate and efficient methods to identify protein homologs. Traditionally, sequence similarity-based methods have dominated protein homolog identification for function identification, but these struggle when the sequence identity between the pairs is low. Recently, transformer architecture-based deep learning methods have achieved breakthrough performances in many fields.

View Article and Find Full Text PDF

The deformation behavior and instabilities occurring during the drawing of high-density polyethylene (HDPE) were investigated using wide- and small-angle X-ray scattering (WAXS and SAXS) and scanning electron microscopy (SEM) in plain HDPE and paraffin wax- and/or chloroform-modified samples. In contrast to neat HDPE, the modified materials demonstrated strongly suppressed cavitation. However, regardless of cavitation, the tensile deformation of all samples was found to be governed by crystallographic mechanisms active in the crystalline lamellae, supported by shear in the amorphous layers, i.

View Article and Find Full Text PDF

To investigate the statistical laws of acoustic emission energy (AEE) avalanche dynamics of sandstone under varying fracture lengths and dip angles, as well as to determine the relationship between acoustic emission (AE) parameters and damage variables, we studied the mechanical properties and AE characteristics of sandstone with a single fracture subjected to uniaxial compression with the aid of the Shimadzu AG-IS test system and the PCI-2 AE system. The AEE characteristics of fractured sandstone under load were analyzed based on the statistical method of avalanche dynamics, with emphasis on AEE distribution, aftershock sequence, and waiting time distribution. The Weibull distribution function that incorporates a correction coefficient β was employed to optimize the Weibull parameters based on the strain equivalent hypothesis theory, which led to the establishment of a statistical damage constitutive model for fractured rock.

View Article and Find Full Text PDF

Background: Tropical Candida spondylitis is an uncommon cause of lower back pain in patients, especially in non-tropical areas or in patients not at risk of immunocompromise.

Case Presentation: A 65-year-old woman presented with a six-month history of poorly managed low back pain, now accompanied by numbness and pain in both lower extremities. Her medical history was significant for tertiary hypertension.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!