Motivation: Centromeres are chromosomal regions historically understudied with sequencing technologies due to their repetitive nature and short-read mapping limitations. However, recent improvements in long-read sequencing allow for the investigation of complex regions of the genome at the sequence and epigenetic levels.
Results: Here, we present Centromere Dip Region (CDR)-Finder: a tool to identify regions of hypomethylation within the centromeres of high-quality, contiguous genome assemblies.
Centromeres are chromosomal regions historically understudied with sequencing technologies due to their repetitive nature and short-read mapping limitations. However, recent improvements in long-read sequencing allowed for the investigation of complex regions of the genome at the sequence and epigenetic levels. Here, we present Centromere Dip Region (CDR)-Finder: a tool to identify regions of hypomethylation within the centromeres of high-quality, contiguous genome assemblies.
View Article and Find Full Text PDFClinical prediction models (CPMs) are tools that compute the risk of an outcome given a set of patient characteristics and are routinely used to inform patients, guide treatment decision-making, and resource allocation. Although much hope has been placed on CPMs to mitigate human biases, CPMs may potentially contribute to racial disparities in decision-making and resource allocation. While some policymakers, professional organizations, and scholars have called for eliminating race as a variable from CPMs, others raise concerns that excluding race may exacerbate healthcare disparities and this controversy remains unresolved.
View Article and Find Full Text PDFPrevious studies suggested that the copy number of the human salivary amylase gene, , correlates with starch-rich diets. However, evolutionary analyses are hampered by the absence of accurate, sequence-resolved haplotype variation maps. We identified 30 structurally distinct haplotypes at nucleotide resolution among 98 present-day humans, revealing that the coding sequences of copies are evolving under negative selection.
View Article and Find Full Text PDFHaplotype information is crucial for biomedical and population genetics research. However, current strategies to produce de novo haplotype-resolved assemblies often require either difficult-to-acquire parental data or an intermediate haplotype-collapsed assembly. Here, we present Graphasing, a workflow which synthesizes the global phase signal of Strand-seq with assembly graph topology to produce chromosome-scale de novo haplotypes for diploid genomes.
View Article and Find Full Text PDFMotivation: We are now in the era of being able to routinely generate highly contiguous (near telomere-to-telomere) genome assemblies of human and nonhuman species. Complex structural variation and regions of rapid evolutionary turnover are being discovered for the first time. Thus, efficient and informative visualization tools are needed to evaluate and directly observe structural differences between two or more genomes.
View Article and Find Full Text PDFThe 10q11.22 chromosomal region is a duplication-rich interval of the human genome and one of the last to be fully assembled. It carries copy number-variable genes associated with intellectual disability, bipolar disorder, and obesity.
View Article and Find Full Text PDFUsing five complementary short- and long-read sequencing technologies, we phased and assembled >95% of each diploid human genome in a four-generation, 28-member family (CEPH 1463) allowing us to systematically assess mutations (DNMs) and recombination. From this family, we estimate an average of 192 DNMs per generation, including 75.5 single-nucleotide variants (SNVs), 7.
View Article and Find Full Text PDFSegmental duplications (SDs) contribute significantly to human disease, evolution, and diversity yet have been difficult to resolve at the sequence level. We present a population genetics survey of SDs by analyzing 170 human genome assemblies where the majority of SDs are fully resolved using long-read sequence assembly. Excluding the acrocentric short arms, we identify 173.
View Article and Find Full Text PDFConventional life-history theory predicts that energy-demanding events such as reproduction and migration must be temporally segregated to avoid resource limitation. Here, we provide, to our knowledge, the first direct evidence of 'itinerant breeding' in a migratory bird, an incredibly rare breeding strategy (less than 0.1% of extant bird species) that involves the temporal overlap of migratory and reproductive periods of the annual cycle.
View Article and Find Full Text PDFThe secreted mucins MUC5AC and MUC5B play critical defensive roles in airway pathogen entrapment and mucociliary clearance by encoding large glycoproteins with variable number tandem repeats (VNTRs). These polymorphic and degenerate protein coding VNTRs make the loci difficult to investigate with short reads. We characterize the structural diversity of and by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes.
View Article and Find Full Text PDFHaplotype information is crucial for biomedical and population genetics research. However, current strategies to produce haplotype-resolved assemblies often require either difficult-to-acquire parental data or an intermediate haplotype-collapsed assembly. Here, we present Graphasing, a workflow which synthesizes the global phase signal of Strand-seq with assembly graph topology to produce chromosome-scale haplotypes for diploid genomes.
View Article and Find Full Text PDFDown syndrome is the most common form of human intellectual disability caused by precocious segregation and nondisjunction of chromosome 21. Differences in centromere structure have been hypothesized to play a potential role in this process in addition to the well-established risk of advancing maternal age. Using long-read sequencing, we completely sequenced and assembled the centromeres from a parent-child trio where Trisomy 21 arose in the child as a result of a meiosis I error.
View Article and Find Full Text PDFAdvances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS.
View Article and Find Full Text PDFBackground: Animals select and interact with their environment in various ways, including to ensure their physiology is at its optimal capacity, access to prey is possible, and predators can be avoided. Often conflicting, the balance of choices made may vary depending on an individual's life-history and condition. The common lizard (Zootoca vivipara) has egg-laying and live-bearing lineages and displays a variety of dorsal patterns and colouration.
View Article and Find Full Text PDFThe human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region.
View Article and Find Full Text PDFThe prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes.
View Article and Find Full Text PDFDue to the COVID-19 pandemic the virus responsible, SARS-CoV-2, became a source of intense interest for non-expert audiences. The viral spike protein gained particular public interest as the main target for protective immune responses, including those elicited by vaccines. The rapid evolution of SARS-CoV-2 resulted in variations in the spike that enhanced transmissibility or weakened vaccine protection.
View Article and Find Full Text PDFObjective: Missed or cancelled imaging tests may be invisible to the ordering clinician and result in diagnostic delay. We developed an outpatient results notification tool (ORNT) to alert physicians of patients' missed radiology studies.
Design: Randomised controlled evaluation of a quality improvement intervention.