Background: Knowing the phase of marker genotype data can be useful in genome-wide association studies, because it makes it possible to use analysis frameworks that account for identity by descent or parent of origin of alleles and it can lead to a large increase in data quantities via genotype or sequence imputation. Long-range phasing and haplotype library imputation constitute a fast and accurate method to impute phase for SNP data.

Methods: A long-range phasing and haplotype library imputation algorithm was developed. It combines information from surrogate parents and long haplotypes to resolve phase in a manner that is not dependent on the family structure of a dataset or on the presence of pedigree information.

Results: The algorithm performed well in both simulated and real livestock and human datasets in terms of both phasing accuracy and computation efficiency. The percentage of alleles that could be phased in both simulated and real datasets of varying size generally exceeded 98% while the percentage of alleles incorrectly phased in simulated data was generally less than 0.5%. The accuracy of phasing was affected by dataset size, with lower accuracy for dataset sizes less than 1000, but was not affected by effective population size, family data structure, presence or absence of pedigree information, and SNP density. The method was computationally fast. In comparison to a commonly used statistical method (fastPHASE), the current method made about 8% less phasing mistakes and ran about 26 times faster for a small dataset. For larger datasets, the differences in computational time are expected to be even greater. A computer program implementing these methods has been made available.

Conclusions: The algorithm and software developed in this study make feasible the routine phasing of high-density SNP chips in large datasets.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3068938PMC
http://dx.doi.org/10.1186/1297-9686-43-12DOI Listing

Publication Analysis

Top Keywords

long-range phasing
12
method impute
8
impute phase
8
phase snp
8
phasing haplotype
8
haplotype library
8
library imputation
8
simulated real
8
percentage alleles
8
phased simulated
8

Similar Publications

Article Synopsis
  • Long-read technologies from PacBio and ONT have revolutionized genomics research, but there are ongoing challenges in representing genetic diversity and assembling comprehensive pangenomes.
  • The study investigates the necessary data types and volumes for effective de novo genome assembly in pangenome projects, comparing the performance of ONT's Duplex and PacBio HiFi datasets.
  • Results indicate that achieving high-quality phased genomes requires significant amounts of long reads and various supplemental data, with PacBio HiFi showing better phasing accuracy and ONT Duplex producing more complete contigs.
View Article and Find Full Text PDF
Article Synopsis
  • - CTCF plays an essential role in shaping chromatin structure, which is important for gene regulation, but the specific ways this varies between different cell types are not completely understood.
  • - Research shows that differences in how CTCF binds to DNA, influenced by species-specific features and surrounding transcription factor motifs, affect chromatin accessibility and nucleosome arrangement in both mice and humans.
  • - The study highlights that individual transcription factors can either stabilize or destabilize CTCF binding in specific cell types, impacting the overall organization of chromatin over both short and long distances.
View Article and Find Full Text PDF

Resolution of ring chromosomes, Robertsonian translocations, and complex structural variants from long-read sequencing and telomere-to-telomere assembly.

Am J Hum Genet

December 2024

Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02115, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. Electronic address:

Article Synopsis
  • Researchers tackled the challenge of studying structural variants (SVs) in repetitive genomic regions using advanced technologies like long-read sequencing and the gapless T2T assembly.
  • They successfully analyzed 13 complex cases, resolving 10 by identifying specific genomic breakpoints and structures that were previously difficult to sequence, including Robertsonian translocations and ring chromosomes.
  • The study highlighted new mechanisms for SV formation and provided insights into how these genome variations affect gene expression and potential implications for disease diagnosis and genome biology.
View Article and Find Full Text PDF

Long-read sequencing of an advanced cancer cohort resolves rearrangements, unravels haplotypes, and reveals methylation landscapes.

Cell Genom

November 2024

Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada; Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada. Electronic address:

Article Synopsis
  • The Long-Read Personalized OncoGenomics (POG) dataset features 189 patient tumors and 41 matched normal samples, sequenced with Oxford Nanopore Technologies, providing a comprehensive resource for cancer research.
  • It highlights the advantages of long-read sequencing in identifying complex structural variants, viral integrations, and specific DNA behaviors, such as prominent methylation patterns associated with various cancers.
  • The findings underscore the potential of this dataset in precision medicine, serving as a tool for advancing analytical techniques in cancer genomics.
View Article and Find Full Text PDF
Article Synopsis
  • - 5-methylcytosine (5mC) is a common modification on CpG sites in the human genome, and while bisulfite conversion with short-read sequencing helps analyze it, issues like detection bias and limited read lengths create challenges in capturing accurate methylation patterns.
  • - To overcome these limitations, researchers utilized nanopore long-read sequencing, which retains allele information and allows for a more comprehensive analysis of co-methylation, identifying nearly 100,000 methylation haplotype blocks across various cell lines.
  • - They observed that most co-methylation occurs in short ranges, but some interactions span longer distances, indicating complex regulatory mechanisms, and also found varying methylation levels at transcription factor binding sites that
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!