Current DNA compression algorithms work by finding similar repeated regions within the DNA sequence and then encoding these regions together to achieve compression. Our study on chromosome sequence similarity reveals that the length of similar repeated regions within one chromosome is about 4.5% of the total sequence length. The compression gain is often not high because of these short lengths. It is well known that similarity exist among different regions of chromosome sequences. This implies that similar repeated sequences are found among different regions of chromosome sequences. Here, we study cross-chromosomal similarity for DNA sequence compression. The length and location of similar repeated regions among the sixteen chromosomes of S. cerevisiae are studied. It is found that the average percentage of similar subsequences found between two chromosome sequences is about 10% in which 8% comes from cross-chromosomal prediction and 2% from self-chromosomal prediction. The percentage of similar subsquences is about 18% in which only 1.2% comes from self-chromosomal prediction while the rest is from cross-chromosomal prediction among the 16 chromosomes studied. This suggests the importance of cross-chromosomal similarities in addition to self-chromosomal similarities in DNA sequence compression. An additional 23% of storage space could be reduced on average using self-chromosomal and cross-chromosomal predictions in compressing the 16 chromosomes of S. cerevisiae.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2533061PMC
http://dx.doi.org/10.6026/97320630002412DOI Listing

Publication Analysis

Top Keywords

dna sequence
16
sequence compression
12
repeated regions
12
regions chromosome
12
chromosome sequences
12
similarity dna
8
chromosomes cerevisiae
8
cross-chromosomal prediction
8
self-chromosomal prediction
8
sequence
6

Similar Publications

Background: The endangered Kashmir musk deer (Moschus cupreus), native to high-altitude Himalayas, is an ecological significant and endangered ungulate, threatened by habitat loss and poaching for musk pod distributed in western Himalayan ranges of India, Nepal and Afghanistan. Despite its critical conservation status and ecological importance in regulating vegetation dynamics, knowledge gaps persist regarding its population structure and genetic diversity, hindering effective management strategies.

Methods And Results: We aimed to understand the population genetics of Kashmir musk deer in north-western Himalayas using two mitochondrial DNA (mtDNA) regions and 11 microsatellite loci.

View Article and Find Full Text PDF

Interleukin-10 (IL-10) is an immunomodulatory molecule that may play an immunosuppressive role in nonmelanoma skin cancer (NMSC), specifically basal cell carcinoma (BCC). We analyzed the role of IL10 promoter variants in genetic determinants of BCC susceptibility and their association with IL10 mRNA and IL-10 serum levels. Three promoter variants (- 1082 A > G, - 819 T > C, and - 592 A > C) were examined in 250 BCC patients and 250 reference group (RG) individuals.

View Article and Find Full Text PDF

An aerobic, Gram-stain-positive, motile, coccus-shaped actinomycete, designated strain LSe6-4, was isolated from leaves of sea purslane (Sesuvium portulacastrum L.) in Thailand and subjected to a polyphasic taxonomic studies. Growth of the strain occurred at temperatures between 15 and 38 °C, and with NaCl concentrations 0-13%.

View Article and Find Full Text PDF

Perceived discrimination, recognized as a chronic psychosocial stressor, has adverse consequences on health. DNA methylation (DNAm) may be a potential mechanism by which stressors get embedded into the human body at the molecular level and subsequently affect health outcomes. However, relatively little is known about the effects of perceived discrimination on DNAm.

View Article and Find Full Text PDF

The interplay of sex and genotype in disease associations: a comprehensive network analysis in the UK Biobank.

Hum Genomics

January 2025

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Richards Building B304, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA.

Background: Disease comorbidities and longer-term complications, arising from biologically related associations across phenotypes, can lead to increased risk of severe health outcomes. Given that many diseases exhibit sex-specific differences in their genetics, our objective was to determine whether genotype-by-sex (GxS) interactions similarly influence cross-phenotype associations. Through comparison of sex-stratified disease-disease networks (DDNs)-where nodes represent diseases and edges represent their relationships-we investigate sex differences in patterns of polygenicity and pleiotropy between diseases.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!