The human pangenome, a new reference sequence, addresses many limitations of the current GRCh38 reference. The first release is based on 94 high-quality haploid assemblies from individuals with diverse backgrounds. We employed a k-mer indexing strategy for comparative analysis across multiple assemblies, including the pangenome reference, GRCh38, and CHM13, a telomere-to-telomere reference assembly.
View Article and Find Full Text PDFBackground: Diagnosis of rare genetic diseases can be a long, expensive and complex process, involving an array of tests in the hope of obtaining an actionable result. Long-read sequencing platforms offer the opportunity to make definitive molecular diagnoses using a single assay capable of detecting variants, characterizing methylation patterns, resolving complex rearrangements, and assigning findings to long-range haplotypes. Here, we demonstrate the clinical utility of Nanopore long-read sequencing by validating a confirmatory test for copy number variants (CNVs) in neurodevelopmental disorders and illustrate the broader applications of this platform to assess genomic features with significant clinical implications.
View Article and Find Full Text PDFK-mers are short DNA sequences that are used for genome sequence analysis. Applications that use k-mers include genome assembly and alignment. However, the wider bioinformatic use of these short sequences has challenges related to the massive scale of genomic sequence data.
View Article and Find Full Text PDFDysbioisis is an imbalance of an organ's microbiome and plays a role in colorectal cancer pathogenesis. Characterizing the bacteria in the microenvironment of a cancer through genome sequencing has advantages compared to culture-based profiling. However, there are notable technical and analytical challenges in characterizing universal features of tumor microbiomes.
View Article and Find Full Text PDFWe developed a sensitive sequencing approach that simultaneously profiles microsatellite instability, chromosomal instability, and subclonal structure in cancer. We assessed diverse repeat motifs across 225 microsatellites on colorectal carcinomas. Our study identified elevated alterations at both selected tetranucleotide and conventional mononucleotide repeats.
View Article and Find Full Text PDFDNA copy number aberrations (CNA) are frequently observed in colorectal cancers (CRC). There is an urgent need for CNA-based biomarkers in clinics,. n For Stage III CRC, if combined with imaging or pathologic evidence, these markers promise more precise care.
View Article and Find Full Text PDFThe human genome is composed of two haplotypes, otherwise called diplotypes, which denote phased polymorphisms and structural variations (SVs) that are derived from both parents. Diplotypes place genetic variants in the context of cis-related variants from a diploid genome. As a result, they provide valuable information about hereditary transmission, context of SV, regulation of gene expression and other features which are informative for understanding human genetics.
View Article and Find Full Text PDFHepG2 is one of the most widely used human cancer cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher order genomic structural features are largely unknown. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective.
View Article and Find Full Text PDFK562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed.
View Article and Find Full Text PDFVariable tumor cellularity can limit sensitivity and precision in comparative genomics because differences in tumor content can result in misclassifying truncal mutations as region-specific private mutations in stroma-rich regions, especially when studying tissue specimens of mediocre tumor cellularity such as lung adenocarcinomas (LUADs). To address this issue, we refined a nuclei flow-sorting approach by sorting nuclei based on ploidy and the LUAD lineage marker thyroid transcription factor 1 and applied this method to investigate genome-wide somatic copy number aberrations (SCNAs) and mutations of 409 cancer genes in 39 tumor populations obtained from 16 primary tumors and 21 matched metastases. This approach increased the mean tumor purity from 54% (range 7-89%) of unsorted material to 92% (range 79-99%) after sorting.
View Article and Find Full Text PDFGenomic instability is a frequently occurring feature of cancer that involves large-scale structural alterations. These somatic changes in chromosome structure include duplication of entire chromosome arms and aneuploidy where chromosomes are duplicated beyond normal diploid content. However, the accurate determination of aneuploidy events in cancer genomes is a challenge.
View Article and Find Full Text PDFBackground: Genome rearrangements are critical oncogenic driver events in many malignancies. However, the identification and resolution of the structure of cancer genomic rearrangements remain challenging even with whole genome sequencing.
Methods: To identify oncogenic genomic rearrangements and resolve their structure, we analyzed linked read sequencing.