Variant calling is hindered in segmental duplications by sequence homology. We developed Paraphase, a HiFi-based informatics method that resolves highly similar genes by phasing all haplotypes of paralogous genes together. We applied Paraphase to 160 long (>10 kb) segmental duplication regions across the human genome with high (>99%) sequence similarity, encoding 316 genes. Analysis across five ancestral populations revealed highly variable copy numbers of these regions. We identified 23 paralog groups with exceptionally low within-group diversity, where extensive gene conversion and unequal crossing over contribute to highly similar gene copies. Furthermore, our analysis of 36 trios identified 7 de novo SNVs and 4 de novo gene conversion events, 2 of which are non-allelic. Finally, we summarized extensive genetic diversity in 9 medically relevant genes previously considered challenging to genotype. Paraphase provides a framework for resolving gene paralogs, enabling accurate testing in medically relevant genes and population-wide studies of previously inaccessible genes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11890787PMC
http://dx.doi.org/10.1038/s41467-025-57505-2DOI Listing

Publication Analysis

Top Keywords

paralogous genes
8
gene conversion
8
medically relevant
8
relevant genes
8
genes
7
genome-wide profiling
4
highly
4
profiling highly
4
highly paralogous
4
genes hifi
4

Similar Publications

Functional diversity of two apple paralogs MADS5 and MADS35 in regulating flowering and parthenocarpy.

Plant Physiol Biochem

March 2025

State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, Shaanxi, 712100, China. Electronic address:

MADS-box genes play important roles in plant development, especially flowering and fruiting. In this study, we identified 54 type I and 69 type II MADS-box genes from the apple reference genome 'GDDH13'. The type II MADS-box genes were further divided into 12 closely related subgroups, each exhibiting similar gene structures and conserved domains.

View Article and Find Full Text PDF

Implication of ribosomal protein in abiotic and biotic stress.

Planta

March 2025

Department of Chemistry, Biochemistry and Physics and Groupe de Recherche en Biologie Végétale, Université du Québec À Trois-Rivières, Trois-Rivières, Québec, G9A 5H9, Canada.

This review article explores the intricate role, and regulation of ribosomal protein in response to stress, particularly emphasizing their pivotal role to ameliorate abiotic and biotic stress conditions in crop plants. Plants must coordinate ribosomes production to balance cellular protein synthesis in response to environmental variations and pathogens invasion. Over the past decade, research has revealed ribosome subgroups respond to adverse conditions, suggesting that this tight coordination may be grounded in the induction of ribosome variants resulting in differential translation outcomes.

View Article and Find Full Text PDF

Cryptic genetic variants exert minimal or no phenotypic effects alone but have long been hypothesized to form a vast, hidden reservoir of genetic diversity that drives trait evolvability through epistatic interactions. This classical theory has been reinvigorated by pan-genome sequencing, which has revealed pervasive variation within gene families and regulatory networks, including extensive cis-regulatory changes, gene duplication, and divergence between paralogs. Nevertheless, empirical testing of cryptic variation's capacity to fuel phenotypic diversification has been hindered by intractable genetics, limited allelic diversity, and inadequate phenotypic resolution.

View Article and Find Full Text PDF

Mouse models represent a powerful platform to study genes and variants associated with human diseases. While genome editing technologies have increased the rate and precision of model development, predicting and installing specific types of mutations in mice that mimic the native human genetic context is complicated. Computational tools can identify and align orthologous wild-type genetic sequences from different species; however, predictive modeling and engineering of equivalent mouse variants that mirror the nucleotide and/or polypeptide change effects of human variants remains challenging.

View Article and Find Full Text PDF

The budding yeast Saccharomyces cerevisiae is a widely utilized host cell for recombinant protein production due to its well studied and annotated genome, its ability to secrete large and post-translationally modified proteins, fast growth and cost-effective culturing. However, recombinant protein yields from S. cerevisiae often fall behind that of other host systems.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!