Revolutionary advances in DNA sequencing technologies fundamentally change the nature of genomics. Today's sequencing technologies have opened into an outburst in genomic data volume. These data can be used in various applications where long-term storage and analysis of genomic sequence data are required. Data-specific compression algorithms can effectively manage a large volume of data. In recent times, deep learning has achieved great success in many compression tools and is gradually being used in genomic sequence compression. Significantly, autoencoder has been applied in dimensionality reduction, compact representations of data, and generative model learning. It can use convolutional layers to learn essential features from input data, which is better for image and series data. Autoencoder reconstructs the input data with some loss of information. Since accuracy is critical in genomic data, compressed genomic data must be decompressed without any information loss. We introduce a new scheme to address the loss incurred in the decompressed data of the autoencoder. This paper proposes a novel algorithm called GenCoder for reference-free compression of genomic sequences using a convolutional autoencoder and regenerating the genomic sequences from a latent code produced by the autoencoder, and retrieving original data losslessly. Performance evaluation is conducted on various genomes and benchmarked datasets. The experimental results on the tested data demonstrate that the deep learning model used in the proposed compression algorithm generalizes well for genomic sequence data and achieves a compression gain of 27% over the best state-of-the-art method.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TCBB.2024.3366240 | DOI Listing |
Gigascience
January 2025
Leibniz Institute for the Analysis of Biodiversity Change, Museum Koenig Bonn, 53113 Bonn, Germany.
Background: In this study, we present an in-depth analysis of the Eurasian minnow (Phoxinus phoxinus) genome, highlighting its genetic diversity, structural variations, and evolutionary adaptations. We generated an annotated haplotype-phased, chromosome-level genome assembly (2n = 50) by integrating high-fidelity (HiFi) long reads and chromosome conformation capture data (Hi-C).
Results: We achieved a haploid size of 940 megabase pairs (Mbp) for haplome 1 and 929 Mbp for haplome 2 with high scaffold N50 values of 36.
Mol Biol Evol
January 2025
Laboratório de Algoritmos em Biologia, Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Brazil.
A key trait of Eukarya is the independent evolution of complex multicellular (CM) in animals, plants, fungi, brown algae and red algae. This phenotype is characterized by the initial exaptation of cell-cell adhesion genes followed by the emergence of mechanisms for cell-cell communication, together with the expansion of transcription factor gene families responsible for cell and tissue identity. The number of cell types (NCT) is commonly used as a quantitative proxy for biological complexity in comparative genomics studies.
View Article and Find Full Text PDFFront Endocrinol (Lausanne)
January 2025
School of Public Health, Xinjiang Medical University, Urumqi, Xinjiang, China.
Objective: Diabetic neuropathy (DN), a common and debilitating complication of diabetes, significantly impairs the quality of life of affected individuals. While multiple studies have indicated changes in the expression of specific matrix metalloproteinases (MMPs) in patients with DN, and basic research has reported the impact of MMPs on DN, there is a lack of systematic research and the causal relationship remains unclear. The objective of this research is to investigate the casual relationship between MMPs and DN through two-sample Mendelian randomization (MR).
View Article and Find Full Text PDFFront Endocrinol (Lausanne)
January 2025
Department of Urology, The First Affiliated Hospital of Jinzhou Medical University, Jinzhou Medical University, Jinzhou, Liaoning, China.
Objective: The impact of lipid-lowering medications on chronic kidney disease (CKD) remains a subject of debate. This Mendelian randomization (MR) study aims to elucidate the potential effects of lipid-lowering drug targets on CKD development.
Methods: We extracted 11 genetic variants encoding targets of lipid-lowering drugs from published genome-wide association study (GWAS) summary statistics, encompassing LDLR, HMGCR, PCSK9, NPC1L1, APOB, ABCG5/ABCG8, LPL, APOC3, ANGPTL3, and PPARA.
Data Brief
February 2025
Department of Biology, Allama Iqbal Open University, Islamabad, Pakistan.
Plants are colonized by a vast array of microorganisms that outstrip plant cell densities and genes, thus referred to as plant's second genome or extended genome. The microbial communities exert a significant influence on the vigor, growth, development and productivity of plants by supporting nutrient acquisition, organic matter decomposition and tolerance against biotic and abiotic stresses such as heat, high salt, drought and disease, by regulating plant defense responses. The rhizosphere is a complex micro-ecological zone in the direct vicinity of plant roots and is considered a hotspot of microbial diversity.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!