Publications by authors named "Allison Rozanski"

Motivation: Centromeres are chromosomal regions historically understudied with sequencing technologies due to their repetitive nature and short-read mapping limitations. However, recent improvements in long-read sequencing allow for the investigation of complex regions of the genome at the sequence and epigenetic levels.

Results: Here, we present Centromere Dip Region (CDR)-Finder: a tool to identify regions of hypomethylation within the centromeres of high-quality, contiguous genome assemblies.

View Article and Find Full Text PDF

Centromeres are chromosomal regions historically understudied with sequencing technologies due to their repetitive nature and short-read mapping limitations. However, recent improvements in long-read sequencing allowed for the investigation of complex regions of the genome at the sequence and epigenetic levels. Here, we present Centromere Dip Region (CDR)-Finder: a tool to identify regions of hypomethylation within the centromeres of high-quality, contiguous genome assemblies.

View Article and Find Full Text PDF
Article Synopsis
  • Human centromeres are challenging to sequence due to their large size and repetitive nature, limiting our understanding of their variation and evolutionary function.
  • Using long-read sequencing, researchers completely sequenced and assembled all centromeres from a second human genome, revealing a significant increase in genetic variation and size differences between centromeres.
  • Comparative analysis of centromeric sequences across species, including humans and great apes, highlights the rapid evolution of α-satellite DNA and suggests limited recombination between chromosome arms, aiding in studying centromeric DNA evolution.
View Article and Find Full Text PDF

Down syndrome is the most common form of human intellectual disability caused by precocious segregation and nondisjunction of chromosome 21. Differences in centromere structure have been hypothesized to play a potential role in this process in addition to the well-established risk of advancing maternal age. Using long-read sequencing, we completely sequenced and assembled the centromeres from a parent-child trio where Trisomy 21 arose in the child as a result of a meiosis I error.

View Article and Find Full Text PDF
Article Synopsis
  • * We discovered over 1.3 million lineage-specific structural variants (SVs) that impact thousands of protein-coding genes and regulatory elements, revealing significant genomic differences among primates, especially compared to humans.
  • * Our research identified 1,607 regions with structural variations that are hotspots for gene loss and creation, indicating areas in the genome subject to rapid evolution and natural selection across primate species.
View Article and Find Full Text PDF

We completely sequenced and assembled all centromeres from a second human genome and used two reference sets to benchmark genetic, epigenetic, and evolutionary variation within centromeres from a diversity panel of humans and apes. We find that centromere single-nucleotide variation can increase by up to 4.1-fold relative to other genomic regions, with the caveat that up to 45.

View Article and Find Full Text PDF

Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.

View Article and Find Full Text PDF
Article Synopsis
  • Recent advancements in phased genome assembly, especially using long-read data and parental information, still leave significant gaps, averaging over 140 per assembly from trio-hifiasm methods.
  • A comprehensive analysis of 182 haploid assemblies shows that chromosome-wide accuracy is similar when using Strand-seq instead of parental data, with many gaps clustering near large repeat regions.
  • The research highlights that a considerable amount of human DNA is misoriented and includes notable variations like deletions and insertions, suggesting key areas for future algorithm improvements and better pangenome models.
View Article and Find Full Text PDF

The telomere-to-telomere (T2T) complete human reference has significantly improved our ability to characterize genome structural variation. To understand its impact on inversion polymorphisms, we remapped data from 41 genomes against the T2T reference genome and compared it to the GRCh38 reference. We find a ~ 21% increase in sensitivity improving mapping of 63 inversions on the T2T reference.

View Article and Find Full Text PDF

To better understand the pattern of primate genome structural variation, we sequenced and assembled using multiple long-read sequencing technologies the genomes of eight nonhuman primate species, including New World monkeys (owl monkey and marmoset), Old World monkey (macaque), Asian apes (orangutan and gibbon), and African ape lineages (gorilla, bonobo, and chimpanzee). Compared to the human genome, we identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. Across 50 million years of primate evolution, we estimate that 819.

View Article and Find Full Text PDF

Motivation: Highly contiguous de novo phased diploid genome assemblies are now feasible for large numbers of species and individuals. Methods are needed to validate assembly accuracy and detect misassemblies with orthologous sequencing data to allow for confident downstream analyses.

Results: We developed GAVISUNK, an open-source pipeline that detects misassemblies and produces a set of reliable regions genome-wide by assessing concordance of distances between unique k-mers in Pacific Biosciences high-fidelity assemblies and raw Oxford Nanopore Technologies reads.

View Article and Find Full Text PDF

Obligate insect social parasites evolve traits to effectively locate and then exploit their hosts, whereas hosts have complex social behavioral repertoires, which include sensory recognition to reject potential conspecific intruders and heterospecific parasites. While social parasites and host behaviors have been studied extensively, less is known about how their sensory systems function to meet their specific selective pressures. Here, we compare investment in visual and olfactory brain regions in the paper wasp Polistes dominula, and its obligate social parasite P.

View Article and Find Full Text PDF