Publications by authors named "Sunyaev S"

Low-density lipoprotein cholesterol (LDL-C) is a well-established risk factor for cardiovascular disease, and it plays a causal role in the development of atherosclerosis. Genome-wide association studies (GWASs) have successfully identified hundreds of genetic variants associated with LDL-C. Most of these risk loci fall in non-coding regions of the genome, and it is unclear how these non-coding variants affect circulating lipid levels.

View Article and Find Full Text PDF

Gene networks encapsulate biological knowledge, often linked to polygenic diseases. While model system experiments generate many plausible gene networks, validating their role in human phenotypes requires evidence from human genetics. Rare variants provide the most straightforward path for such validation.

View Article and Find Full Text PDF

Using the Telomere-to-Telomere reference, we assembled the distribution of simple repeat lengths present in the human genome. Analyzing over two hundred mammalian genomes, we found remarkable consistency in the shape of the distribution across evolutionary epochs. All observed genomes harbor an excess of long repeats, which are prone to developing into repeat expansion disorders.

View Article and Find Full Text PDF

Mitochondrial DNA (mtDNA) has an important yet often overlooked role in health and disease. Constraint models quantify the removal of deleterious variation from the population by selection and represent powerful tools for identifying genetic variation that underlies human phenotypes. However, nuclear constraint models are not applicable to mtDNA, owing to its distinct features.

View Article and Find Full Text PDF
Article Synopsis
  • * A study using deep whole-genome sequencing of brain neurons found that SCZ cases had more somatic mutations in regions of active gene expression compared to controls.
  • * These somatic mutations, particularly at transcription factor binding sites, may affect gene expression related to SCZ and contribute to its development during brain formation.
View Article and Find Full Text PDF

Motivation: Functional Annotation of genomic Variants Online Resources (FAVOR) offers multi-faceted, whole genome variant functional annotations, which is essential for Whole Genome and Exome Sequencing (WGS/WES) analysis and the functional prioritization of disease-associated variants. A versatile chatbot designed to facilitate informative interpretation and interactive, user-centric summary of the whole genome variant functional annotation data in the FAVOR database is needed.

Results: We have developed FAVOR-GPT, a generative natural language interface powered by integrating large language models (LLMs) and FAVOR.

View Article and Find Full Text PDF
Article Synopsis
  • Autoimmune and inflammatory diseases involve multiple genes and often share risk alleles, making it tough to pinpoint specific causes.
  • A study analyzing over 129,000 cases and controls found that about 40% of related genetic associations come from the same genetic variants across six different diseases.
  • By improving the resolution of genetic mapping, the researchers could identify more related gene expressions, suggesting that while there are common mechanisms between these diseases, there isn't just one universal cause for all autoimmune diseases.
View Article and Find Full Text PDF
Article Synopsis
  • Research is exploring whether neurodegenerative diseases caused by similar protein misfolding share genetic risk factors, but traditional studies lack the power to conclusively determine this.
  • By selecting patients based on their specific protein aggregation rather than just their clinical diagnosis, researchers can better identify genetic variants associated with diseases like Parkinson's and Alzheimer's.
  • The study finds that genetic modifiers related to alpha-synuclein and beta-amyloid contribute to shared risk factors in neurodegenerative diseases, indicating common underlying mechanisms across different conditions.
View Article and Find Full Text PDF
Article Synopsis
  • Recent advancements in genomics for diagnosing rare diseases focus on "N-of-1" analyses, allowing for tailored studies on individual patients with ultra-rare conditions.
  • The Undiagnosed Diseases Network (UDN) enables collaborative research across various U.S. clinical and research centers, which enhances the ability to analyze whole genome sequencing data from multiple patients simultaneously.
  • Introducing a new software package, RaMeDiES, the team provides tools for automated comparisons of genomic data, leading to novel disease associations and improving overall understanding of genetic links to these rare diseases.
View Article and Find Full Text PDF
Article Synopsis
  • Expansions of tandem repeats (TRs) are linked to about 60 genetic diseases, and finding more pathogenic repeats could improve disease diagnosis.
  • RExPRT (Repeat EXpansion Pathogenicity pRediction Tool) is a machine learning tool designed to differentiate harmful TR expansions from harmless ones.
  • The tool has shown impressive results, achieving an average precision of 93% and recall of 83%, making it helpful for prioritizing which genetic candidates to study further in large-scale research.
View Article and Find Full Text PDF
Article Synopsis
  • The study introduces a new method called LDSPEC to estimate the relationship between causal disease effect sizes of nearby SNPs, challenging the assumption that they are independent.
  • It analyzes data from 70 diseases in the UK Biobank, discovering significant correlations in effect sizes among proximal SNP pairs, which vary based on different factors such as distance and allele frequency.
  • The research finds that SNP pairs with related functions show stronger correlations extending over longer genomic distances, and it reveals that SNP-heritability estimates are lower than previously thought, indicating a discrepancy between expected and real genetic contributions to diseases.
View Article and Find Full Text PDF
Article Synopsis
  • The study investigates the relationships between causal disease effect sizes of proximal SNPs (single nucleotide polymorphisms) using a new method called LDSPEC, suggesting that these SNPs are not independent as previously thought.
  • By applying LDSPEC to data from 70 diseases in the UK Biobank, researchers found that the correlations in effect sizes between nearby SNPs varied based on distance, allele frequency, and linkage disequilibrium (LD), indicating complex interactions.
  • The results reveal that SNP pairs with shared functions show stronger correlations over longer distances, leading to a significant discrepancy between SNP-heritability estimates and the total variance of causal effect sizes, challenging prior assumptions in genetic research.
View Article and Find Full Text PDF
Article Synopsis
  • Extreme disease phenotypes, like infectious purpura fulminans (PF), can reveal important insights into common health conditions but are hard to study due to their rarity.
  • Researchers utilized a new method called the rare variant trend test (RVTT) to analyze genetic risk factors associated with PF, examining both prospective patient samples and historical records from large hospital systems.
  • They discovered a significant increase in low-frequency variants in the complement system among PF patients, linking these genetic changes to severe hyperinflammation in sepsis through loss and gain of function in complement receptors CR3 and CR4.
View Article and Find Full Text PDF

De novo mutations occur at substantially different rates depending on genomic location, sequence context and DNA strand. The success of methods to estimate selection intensity, infer demographic history and map rare disease genes, depends strongly on assumptions about the local mutation rate. Here we present Roulette, a genome-wide mutation rate model at basepair resolution that incorporates known determinants of local mutation rate.

View Article and Find Full Text PDF

Personalized genome sequencing has revealed millions of genetic differences between individuals, but our understanding of their clinical relevance remains largely incomplete. To systematically decipher the effects of human genetic variants, we obtained whole-genome sequencing data for 809 individuals from 233 primate species and identified 4.3 million common protein-altering variants with orthologs in humans.

View Article and Find Full Text PDF

Unlabelled: Personalized genome sequencing has revealed millions of genetic differences between individuals, but our understanding of their clinical relevance remains largely incomplete. To systematically decipher the effects of human genetic variants, we obtained whole genome sequencing data for 809 individuals from 233 primate species, and identified 4.3 million common protein-altering variants with orthologs in human.

View Article and Find Full Text PDF

Recurrent mutation produces multiple copies of the same allele which may be co-segregating in a population. Yet, most analyses of allele-frequency or site-frequency spectra assume that all observed copies of an allele trace back to a single mutation. We develop a sampling theory for the number of latent mutations in the ancestry of a rare variant, specifically a variant observed in relatively small count in a large sample.

View Article and Find Full Text PDF

Non-B DNA structures formed by repetitive sequence motifs are known instigators of mutagenesis in experimental systems. Analyzing this phenomenon computationally in the human genome requires careful disentangling of intrinsic confounding factors, including overlapping and interrupted motifs and recurrent sequencing errors. Here, we show that accounting for these factors eliminates all signals of repeat-induced mutagenesis that extend beyond the motif boundary, and eliminates or dramatically shrinks the magnitude of mutagenesis within some motifs, contradicting previous reports.

View Article and Find Full Text PDF

Genetic association studies of many heritable traits resulting from physiological testing often have modest sample sizes due to the cost and burden of the required phenotyping. This reduces statistical power and limits discovery of multiple genetic associations. We present a strategy to leverage pleiotropy between traits to both discover new loci and to provide mechanistic hypotheses of the underlying pathophysiology.

View Article and Find Full Text PDF
Article Synopsis
  • The genetic basis of traits is mainly polygenic and influenced by non-coding alleles, which are thought to have minor regulatory roles in gene expression.
  • Despite having access to extensive gene expression and epigenomic data, few connections between genetic variants and gene activity have been established.
  • A study identified 220 gene-trait pairs influenced by protein-coding variants, revealing little evidence that typical gene expression explains associations with complex traits, indicating a need for improved models to understand these complexities.
View Article and Find Full Text PDF

Large biobank-scale whole genome sequencing (WGS) studies are rapidly identifying a multitude of coding and non-coding variants. They provide an unprecedented resource for illuminating the genetic basis of human diseases. Variant functional annotations play a critical role in WGS analysis, result interpretation, and prioritization of disease- or trait-associated causal variants.

View Article and Find Full Text PDF

A multitude of demographic, health, and genetic factors are associated with the risk of developing severe COVID-19 following infection by the SARS-CoV-2. There is a need to perform studies across human societies and to investigate the full spectrum of genetic variation of the virus. Using data from 869 COVID-19 patients in Bahrain between March 2020 and March 2021, we analyzed paired viral sequencing and non-genetic host data to understand host and viral determinants of severe COVID-19.

View Article and Find Full Text PDF

Despite genomic sequencing rapidly transforming from being a bench-side tool to a routine procedure in a hospital, there is a noticeable lack of genomic analysis software that supports both clinical and research workflows as well as crowdsourcing. Furthermore, most existing software packages are not forward-compatible in regards to supporting ever-changing diagnostic rules adopted by the genetics community. Regular updates of genomics databases pose challenges for reproducible and traceable automated genetic diagnostics tools.

View Article and Find Full Text PDF

Rare copy-number variants (rCNVs) include deletions and duplications that occur infrequently in the global human population and can confer substantial risk for disease. In this study, we aimed to quantify the properties of haploinsufficiency (i.e.

View Article and Find Full Text PDF