Background: Identification of functional elements of a genome often requires dividing a sequence of measurements along a genome into segments where adjacent segments have different properties, such as different mean values. Despite dozens of algorithms developed to address this problem in genomics research, methods with improved accuracy and speed are still needed to effectively tackle both existing and emerging genomic and epigenomic segmentation problems.

Results: We designed an efficient algorithm, called iSeg, for segmentation of genomic and epigenomic profiles. iSeg first utilizes dynamic programming to identify candidate segments and test for significance. It then uses a novel data structure based on two coupled balanced binary trees to detect overlapping significant segments and update them simultaneously during searching and refinement stages. Refinement and merging of significant segments are performed at the end to generate the final set of segments. By using an objective function based on the p-values of the segments, the algorithm can serve as a general computational framework to be combined with different assumptions on the distributions of the data. As a general segmentation method, it can segment different types of genomic and epigenomic data, such as DNA copy number variation, nucleosome occupancy, nuclease sensitivity, and differential nuclease sensitivity data. Using simple t-tests to compute p-values across multiple datasets of different types, we evaluate iSeg using both simulated and experimental datasets and show that it performs satisfactorily when compared with some other popular methods, which often employ more sophisticated statistical models. Implemented in C++, iSeg is also very computationally efficient, well suited for large numbers of input profiles and data with very long sequences.

Conclusions: We have developed an efficient general-purpose segmentation tool and showed that it had comparable or more accurate results than many of the most popular segment-calling algorithms used in contemporary genomic data analysis. iSeg is capable of analyzing datasets that have both positive and negative values. Tunable parameters allow users to readily adjust the statistical stringency to best match the biological nature of individual datasets, including widely or sparsely mapped genomic datasets or those with non-normal distributions.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5896135PMC
http://dx.doi.org/10.1186/s12859-018-2140-3DOI Listing

Publication Analysis

Top Keywords

genomic epigenomic
16
efficient algorithm
8
segmentation genomic
8
epigenomic data
8
nuclease sensitivity
8
data
7
segments
7
iseg
6
genomic
6
segmentation
5

Similar Publications

Perceived discrimination, recognized as a chronic psychosocial stressor, has adverse consequences on health. DNA methylation (DNAm) may be a potential mechanism by which stressors get embedded into the human body at the molecular level and subsequently affect health outcomes. However, relatively little is known about the effects of perceived discrimination on DNAm.

View Article and Find Full Text PDF

Exposure to toxins causes lasting damaging effects on the body. Numerous studies in humans and animals suggest that diet has the potential to modify the epigenome and these modifications can be inherited transgenerationally, but few studies investigate how diet can protect against negative effects of toxins. Potential evidence in the primary literature supports that caloric restriction, high-fat diets, high protein-to-carbohydrate ratios, and dietary supplementation protect against environmental toxins and strengthen these effects on their offspring's epigenome.

View Article and Find Full Text PDF

DNA methylation (DNAm) is a key epigenetic mark that shows profound alterations in cancer. Read-level methylomes enable more in-depth analyses, due to their broad genomic coverage and preservation of rare cell-type signals, compared to summarized data such as 450K/EPIC microarrays. Here, we propose MethylBERT, a Transformer-based model for read-level methylation pattern classification.

View Article and Find Full Text PDF

Recent progress in CRISPR/Cas9 system for eye disorders.

Prog Mol Biol Transl Sci

January 2025

Graduate School of Biomedical Science and Engineering, Hanyang University, Seoul, South Korea; College of Medicine, Hanyang University, Seoul, South Korea. Electronic address:

Ocular disorders encompass a broad spectrum of phenotypic and clinical symptoms resulting from several genetic variants and environmental factors. The unique anatomy and physiology of the eye facilitate validation of cutting-edge gene editing treatments. Genome editing developments have allowed researchers to treat a variety of diseases, including ocular disorders.

View Article and Find Full Text PDF

Metabolism-driven chromatin dynamics: Molecular principles and technological advances.

Mol Cell

January 2025

Department of Genetics and Development and Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA. Electronic address:

Cells integrate metabolic information into core molecular processes such as transcription to adapt to environmental changes. Chromatin, the physiological template of the eukaryotic genome, has emerged as a sensor and rheostat for fluctuating intracellular metabolites. In this review, we highlight the growing list of chromatin-associated metabolites that are derived from diverse sources.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!