Publications by authors named "Nilah Ioannidis"

Article Synopsis
  • - Kidney failure significantly impacts health, prompting a large-scale study of 406,504 participants to uncover genetic factors affecting kidney function, identifying 430 key genetic loci.
  • - The research revealed that 56% of inherited differences in kidney function are linked to regulatory elements in kidney tubule epithelial cells, while 7% relate to podocyte cells, suggesting these are crucial for gene expression.
  • - Further analysis using advanced techniques like enhancer assays and CRISPRi identified specific genes (NDRG1, CCNB1, and STC1) regulated by these genetic loci, shedding light on their roles in kidney function.
View Article and Find Full Text PDF
Article Synopsis
  • Deep learning models are used to predict epigenetic features, but their performance varies, especially in cell type-specific regions crucial for gene regulation.
  • The study compares general-purpose models and tissue-specific models, finding that tailored models can enhance accuracy in predicting chromatin accessibility in specific cells.
  • It emphasizes the need for novel strategies to improve predictions on genetic variants, as high reference sequence accuracy does not guarantee better variant effect predictions.
View Article and Find Full Text PDF
Article Synopsis
  • A variety of deep learning models are being developed to predict chromatin accessibility from DNA sequences, but evaluation results often overlook the significance of cell type specific regulatory elements (CREs), which are crucial for gene regulation and complex disease heritability.
  • The study evaluates the accuracy of these genomic models, revealing that general purpose models like Enformer and Sei perform worse in regions that are specifically accessible to certain cell types.
  • The research highlights that tailoring models for specific tissues and enhancing their capacity for cell type specific regulation can boost performance, but improving predictions of reference sequences doesn't necessarily translate to better predictions of variant effects, suggesting the need for new approaches in the field.
View Article and Find Full Text PDF
Article Synopsis
  • Kidney disease is largely influenced by genetics, yet the specific genes and mechanisms involved are still not fully understood; a recent GWAS identified 462 genetic loci associated with kidney function.
  • Researchers used single-cell ATAC-seq maps to explore chromatin accessibility in the kidney, finding that regulatory elements in kidney tubule epithelial cells accounted for the majority of genetic heritability related to kidney function.
  • The study further utilized CRISPR interference to demonstrate how inherited variations in regulatory elements impact gene expression in tubule epithelial cells, ultimately linking these differences to a predisposition for kidney disease in humans.
View Article and Find Full Text PDF

Gene therapies have the potential to treat disease by delivering therapeutic genetic cargo to disease-associated cells. One limitation to their widespread use is the lack of short regulatory sequences, or promoters, that differentially induce the expression of delivered genetic cargo in target cells, minimizing side effects in other cell types. Such cell-type-specific promoters are difficult to discover using existing methods, requiring either manual curation or access to large datasets of promoter-driven expression from both targeted and untargeted cells.

View Article and Find Full Text PDF

Regular, systematic, and independent assessment of computational tools used to predict the pathogenicity of missense variants is necessary to evaluate their clinical and research utility and suggest directions for future improvement. Here, as part of the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, we assess missense variant effect predictors (or variant impact predictors) on an evaluation dataset of rare missense variants from disease-relevant databases. Our assessment evaluates predictors submitted to the CAGI6 Annotate-All-Missense challenge, predictors commonly used by the clinical genetics community, and recently developed deep learning methods for variant effect prediction.

View Article and Find Full Text PDF

Genomic sequence-to-activity models are increasingly utilized to understand gene regulatory syntax and probe the functional consequences of regulatory variation. Current models make accurate predictions of relative activity levels across the human reference genome, but their performance is more limited for predicting the effects of genetic variants, such as explaining gene expression variation across individuals. To better understand the causes of these shortcomings, we examine the uncertainty in predictions of genomic sequence-to-activity models using an ensemble of Basenji2 model replicates.

View Article and Find Full Text PDF

Genomic deep learning models can predict genome-wide epigenetic features and gene expression levels directly from DNA sequence. While current models perform well at predicting gene expression levels across genes in different cell types from the reference genome, their ability to explain expression variation between individuals due to cis-regulatory genetic variants remains largely unexplored. Here, we evaluate four state-of-the-art models on paired personal genome and transcriptome data and find limited performance when explaining variation in expression across individuals.

View Article and Find Full Text PDF

Computational genomics increasingly relies on machine learning methods for genome interpretation, and the recent adoption of neural sequence-to-function models highlights the need for rigorous model specification and controlled evaluation, problems familiar to other fields of AI. Research strategies that have greatly benefited other fields - including benchmarking, auditing, and algorithmic fairness - are also needed to advance the field of genomic AI and to facilitate model development. Here we propose a genomic AI benchmark, GUANinE, for evaluating model generalization across a number of distinct genomic tasks.

View Article and Find Full Text PDF
Article Synopsis
  • Genetic variation in humans significantly influences disease risk, yet many missense variants remain uncharacterized; this study develops a computational model leveraging saturation mutagenesis to predict the pathogenicity of these variants.
  • The model, called CPT-1, is trained on deep mutational scanning data from just five proteins and outperforms existing methods in clinical variant interpretation, particularly excelling in sensitivity and specificity for detecting disease-related variants.
  • By incorporating various predictive features from protein sequences and structures, the framework is versatile for future enhancements and has released predictions for missense variants in 90% of human genes, showcasing the potential of mutational scanning data in variant analysis.
View Article and Find Full Text PDF

Fine-mapping methods, which aim to identify genetic variants responsible for complex traits following genetic association studies, typically assume that sufficient adjustments for confounding within the association study cohort have been made, e.g., through regressing out the top principal components (i.

View Article and Find Full Text PDF

The ability to deliver genetic cargo to human cells is enabling rapid progress in molecular medicine, but designing this cargo for precise expression in specific cell types is a major challenge. Expression is driven by regulatory DNA sequences within short synthetic promoters, but relatively few of these promoters are cell-type-specific. The ability to design cell-type-specific promoters using model-based optimization would be impactful for research and therapeutic applications.

View Article and Find Full Text PDF

Age is the primary risk factor for many common human diseases. Here, we quantify the relative contributions of genetics and aging to gene expression patterns across 27 tissues from 948 humans. We show that the predictive power of expression quantitative trait loci is impacted by age in many tissues.

View Article and Find Full Text PDF

Background: Family history of prostate cancer (PCa) is a well-known risk factor, and both common and rare genetic variants are associated with the disease.

Objective: To detect new genetic variants associated with PCa, capitalizing on the role of family history and more aggressive PCa.

Design, Setting, And Participants: A two-stage design was used.

View Article and Find Full Text PDF

Summary: Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance.

View Article and Find Full Text PDF

Purpose: Limb-girdle muscular dystrophies (LGMD) are a genetically heterogeneous category of autosomal inherited muscle diseases. Many genes causing LGMD have been identified, and clinical trials are beginning for treatment of some genetic subtypes. However, even with the gene-level mechanisms known, it is still difficult to get a robust and generalizable prevalence estimation for each subtype due to the limited amount of epidemiology data and the low incidence of LGMDs.

View Article and Find Full Text PDF

Cutaneous squamous cell cancers (cSCCs) present an under-recognized health issue among non-Hispanic whites, one that is likely to increase as populations age. cSCC risks vary considerably among non-Hispanic whites, and this heterogeneity indicates the need for risk-stratified screening strategies that are guided by patients' personal characteristics and clinical histories. Here we describe cSCCscore, a prediction tool that uses patients' covariates and clinical histories to assign them personal probabilities of developing cSCCs within 3 years after risk assessment.

View Article and Find Full Text PDF

Cutaneous squamous cell carcinoma (cSCC) is a common skin cancer with genetic susceptibility loci identified in recent genome-wide association studies (GWAS). Transcriptome-wide association studies (TWAS) using imputed gene expression levels can identify additional gene-level associations. Here we impute gene expression levels in 6891 cSCC cases and 54,566 controls in the Kaiser Permanente Genetic Epidemiology Research in Adult Health and Aging (GERA) cohort and 25,558 self-reported cSCC cases and 673,788 controls from 23andMe.

View Article and Find Full Text PDF

Background: The immune system has been implicated in the pathophysiology of cutaneous squamous cell carcinoma (cSCC) as evidenced by the substantially increased risk of cSCC in immunosuppressed individuals. Associations between cSCC risk and single nucleotide polymorphisms (SNPs) in the HLA region have been identified by genome-wide association studies (GWAS). The translation of the associated HLA SNPs to structural amino acids changes in HLA molecules has not been previously elucidated.

View Article and Find Full Text PDF

Motivation: Interpreting genetic variation in noncoding regions of the genome is an important challenge for personal genome analysis. One mechanism by which noncoding single nucleotide variants (SNVs) influence downstream phenotypes is through the regulation of gene expression. Methods to predict whether or not individual SNVs are likely to regulate gene expression would aid interpretation of variants of unknown significance identified in whole-genome sequencing studies.

View Article and Find Full Text PDF

Cutaneous squamous cell carcinoma (cSCC) is the second most common cancer among Caucasians in the United States, with rising incidence over the past decade. Treatment for non-melanoma skin cancer, including cSCC, in the United States was estimated to cost $4.8 billion in 2014.

View Article and Find Full Text PDF

The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons.

View Article and Find Full Text PDF

We report a genome-wide association study of cutaneous squamous cell carcinoma conducted among non-Hispanic white members of the Kaiser Permanente Northern California health care system. The study includes a genome-wide screen of 61,457 members (6,891 cases and 54,566 controls) genotyped on the Affymetrix Axiom European array and a replication phase involving an independent set of 6,410 additional members (810 cases and 5,600 controls). Combined analysis of screening and replication phases identified 10 loci containing single-nucleotide polymorphisms (SNPs) with P-values < 5 × 10(-8).

View Article and Find Full Text PDF