Background: Clinically effective and safe genotyping relies on correct reference sequences, often represented by haplotypes. The 1000 Genomes Project recorded individual genotypes across 26 different populations and, using computerized genotype phasing, reported haplotype data. In contrast, we identified long reference sequences by analyzing the homozygous genomic regions in this online database, a concept that has rarely been reported since next generation sequencing data became available.
Study Design And Methods: Phased genotype data for a 80.6 kb region of chromosome 1 was downloaded for all 2,504 unrelated individuals of the 1000 Genome Project Phase 3 cohort. The data was centered on the ACKR1 gene and bordered by the CADM3 and FCER1A genes. Individuals with heterozygosity at a single site or with complete homozygosity allowed unambiguous assignment of an ACKR1 haplotype. A computer algorithm was developed for extracting these haplotypes from the 1000 Genome Project in an automated fashion. A manual analysis validated the data extracted by the algorithm.
Results: We confirmed 902 ACKR1 haplotypes of varying lengths, the longest at 80,584 nucleotides and shortest at 1,901 nucleotides. The combined length of haplotype sequences comprised 19,895,388 nucleotides with a median of 16,014 nucleotides. Based on our approach, all haplotypes can be considered experimentally confirmed and not affected by the known errors of computerized genotype phasing.
Conclusions: Tracts of homozygosity can provide definitive reference sequences for any gene. They are particularly useful when observed in unrelated individuals of large scale sequence databases. As a proof of principle, we explored the 1000 Genomes Project database for ACKR1 gene data and mined long haplotypes. These haplotypes are useful for high throughput analysis with next generation sequencing. Our approach is scalable, using automated bioinformatics tools, and can be applied to any gene.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8150616 | PMC |
http://dx.doi.org/10.1186/s12859-021-04169-6 | DOI Listing |
Pharmacol Res
January 2025
Centre of Clinical Pharmacology & Precision Medicine, William Harvey Research Institute, Queen Mary University of London, London, UK; NIHR Barts Biomedical Research Centre, Queen Mary University of London, London, UK. Electronic address:
New Phytol
January 2025
Harvard University Herbaria and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA.
Powdery mildew is an economically important disease caused by c. 1000 different fungal species. Erysiphe vaccinii is an emerging powdery mildew species that is impacting the blueberry industry.
View Article and Find Full Text PDFAnn Hum Genet
January 2025
Institute of Legal Medicine, Medical University of Innsbruck, Innsbruck, Austria.
Genes (Basel)
November 2024
Center for Medical Science, Fujita Health University, Toyoake 470-1192, Aichi, Japan.
Background/objectives: Recent progress in evolutionary genomics on human () populations has revealed complex demographic events and genomic changes. These include population expansion with complicated migration, substantial population structure, and ancient introgression from other hominins, as well as human characteristics selections. Nevertheless, the genomic regions in which such evolutionary events took place have remained unclear.
View Article and Find Full Text PDFEMBO J
January 2025
College of Life Sciences, Nanjing Agricultural University, 210095, Nanjing, China.
Chloride (Cl) ions cause major damage to crops in saline soils. Understanding the key factors that influence Cl uptake and translocation will aid the breeding of more salt-tolerant crops. Here, using genome-wide association study and transcriptomic analysis, we identified a NITRATE TRANSPORTER 1 (NRT1)/PEPTIDE TRANSPORTER family (NPF) protein, GmNPF7.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!