Variants of uncertain significance (VUS) represent variants that lack sufficient evidence to be confidently associated with a disease, thus posing a challenge in the interpretation of genetic testing results. Here we report an improved method for predicting the VUS of Arylsulfatase A (ARSA) gene as part of the Critical Assessment of Genome Interpretation challenge (CAGI6). Our method uses a transfer learning approach that leverages a pre-trained protein language model to predict the impact of mutations on the activity of the ARSA enzyme, whose deficiency is known to cause a rare genetic disorder, metachromatic leukodystrophy.
View Article and Find Full Text PDFThe Genetics of Neurodevelopmental Disorders Lab in Padua provided a new intellectual disability (ID) Panel challenge for computational methods to predict patient phenotypes and their causal variants in the context of the Critical Assessment of the Genome Interpretation, 6th edition (CAGI6). Eight research teams submitted a total of 30 models to predict phenotypes based on the sequences of 74 genes (VCF format) in 415 pediatric patients affected by Neurodevelopmental Disorders (NDDs). NDDs are clinically and genetically heterogeneous conditions, with onset in infant age.
View Article and Find Full Text PDFWith the advent of artificial intelligence (AI), it is now possible to design diverse and novel molecules from previously unexplored chemical space. However, a challenge for chemists is the synthesis of such molecules. Recently, there have been attempts to develop AI models for retrosynthesis prediction, which rely on the availability of a high-quality training dataset.
View Article and Find Full Text PDFContinued advances in variant effect prediction are necessary to demonstrate the ability of machine learning methods to accurately determine the clinical impact of variants of unknown significance (VUS). Towards this goal, the ARSA Critical Assessment of Genome Interpretation (CAGI) challenge was designed to characterize progress by utilizing 219 experimentally assayed missense VUS in the () gene to assess the performance of community-submitted predictions of variant functional effects. The challenge involved 15 teams, and evaluated additional predictions from established and recently released models.
View Article and Find Full Text PDFBackground: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating.
View Article and Find Full Text PDFApplication of Artificial intelligence (AI) in drug discovery has led to several success stories in recent times. While traditional methods mostly relied upon screening large chemical libraries for early-stage drug-design, de novo design can help identify novel target-specific molecules by sampling from a much larger chemical space. Although this has increased the possibility of finding diverse and novel molecules from previously unexplored chemical space, this has also posed a great challenge for medicinal chemists to synthesize at least some of the de novo designed novel molecules for experimental validation.
View Article and Find Full Text PDFGenerative artificial intelligence algorithms have shown to be successful in exploring large chemical spaces and designing novel and diverse molecules. There has been considerable interest in developing predictive models using artificial intelligence for drug-like properties, which can potentially reduce the late-stage attrition of drug candidates or predict the properties of novel AI-designed molecules. Concurrently, it is important to understand the contribution of functional groups toward these properties and modify them to obtain property-optimized lead compounds.
View Article and Find Full Text PDFBackground: A major obstacle faced by rare disease families is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years, and causal variants are identified in under 50%. The Rare Genomes Project (RGP) is a direct-to-participant research study on the utility of genome sequencing (GS) for diagnosis and gene discovery.
View Article and Find Full Text PDFDrug-induced gene expression profiling provides a lot of useful information covering various aspects of drug discovery and development. Most importantly, this knowledge can be used to discover drugs' mechanisms of action. Recently, deep learning-based drug design methods are in the spotlight due to their ability to explore huge chemical space and design property-optimized target-specific drug molecules.
View Article and Find Full Text PDFMycobacterium tuberculosis (Mtb) is a pathogen of major concern due to its ability to withstand both first- and second-line antibiotics, leading to drug resistance. Thus, there is a critical need for identification of novel anti-tuberculosis agents targeting Mtb-specific proteins. The ceaseless search for novel antimicrobial agents to combat drug-resistant bacteria can be accelerated by the development of advanced deep learning methods, to explore both existing and uncharted regions of the chemical space.
View Article and Find Full Text PDFIn recent years, deep learning-based methods have emerged as promising tools for drug design. Most of these methods are ligand-based, where an initial target-specific ligand data set is necessary to design potent molecules with optimized properties. Although there have been attempts to develop alternative ways to design target-specific ligand data sets, availability of such data sets remains a challenge while designing molecules against novel target proteins.
View Article and Find Full Text PDFInt J Neonatal Screen
June 2020
Short-chain acyl-CoA dehydrogenase deficiency (SCADD) is a rare autosomal recessive disorder of β-oxidation caused by pathogenic variants in the gene. Analyte testing for SCADD in blood and urine, including newborn screening (NBS) using tandem mass spectrometry (MS/MS) on dried blood spots (DBSs), is complicated by the presence of two relatively common variants (c.625G>A and c.
View Article and Find Full Text PDFPublic health newborn screening (NBS) programs provide population-scale ascertainment of rare, treatable conditions that require urgent intervention. Tandem mass spectrometry (MS/MS) is currently used to screen newborns for a panel of rare inborn errors of metabolism (IEMs). The NBSeq project evaluated whole-exome sequencing (WES) as an innovative methodology for NBS.
View Article and Find Full Text PDFIntroduction: Phenotype-driven rare disease gene prioritization relies on high quality curated resources containing disease, gene and phenotype annotations. However, the effectiveness of gene prioritization tools is constrained by the incomplete coverage of rare disease, phenotype and gene annotations in such curated resources.
Methods: We extracted rare disease correlation pairs involving diseases, phenotypes and genes from MEDLINE abstracts and used the information propagation algorithm GCAS to build an association network.
Genetics play a key role in venous thromboembolism (VTE) risk, however established risk factors in European populations do not translate to individuals of African descent because of the differences in allele frequencies between populations. As part of the fifth iteration of the Critical Assessment of Genome Interpretation, participants were asked to predict VTE status in exome data from African American subjects. Participants were provided with 103 unlabeled exomes from patients treated with warfarin for non-VTE causes or VTE and asked to predict which disease each subject had been treated for.
View Article and Find Full Text PDFGenomic variations in a reference collection are naturally represented as genome variation graphs. Such graphs encode common subsequences as vertices and the variations are captured using additional vertices and directed edges. The resulting graphs are directed graphs possibly with cycles.
View Article and Find Full Text PDFBackground: One of the major goals of genomic medicine is the identification of causal genomic variants in a patient and their relation to the observed clinical phenotypes. Prioritizing the genomic variants by considering only the genotype information usually identifies a few hundred potential variants. Narrowing it down further to find the causal disease genes and relating them to the observed clinical phenotypes remains a significant challenge, especially for rare diseases.
View Article and Find Full Text PDFBackground: Severe combined immunodeficiency (SCID) is characterized by arrested T-lymphocyte production and by B-lymphocyte dysfunction, which result in life-threatening infections. Early diagnosis of SCID through population-based screening of newborns can aid clinical management and help improve outcomes; it also permits the identification of previously unknown factors that are essential for lymphocyte development in humans.
Methods: SCID was detected in a newborn before the onset of infections by means of screening of T-cell-receptor excision circles, a biomarker for thymic output.
A brother and sister developed a previously undescribed constellation of autoimmune manifestations within their first year of life, with uncontrollable bullous pemphigoid, colitis, and proteinuria. The boy had hemophilia due to a factor VIII autoantibody and nephrotic syndrome. Both children required allogeneic hematopoietic cell transplantation (HCT), which resolved their autoimmunity.
View Article and Find Full Text PDFPurpose: Severe combined immunodeficiency (SCID) encompasses a group of disorders characterized by reduced or absent T-cell number and function and identified by newborn screening utilizing T-cell receptor excision circles (TRECs). This screening has also identified infants with T lymphopenia who lack mutations in typical SCID genes. We report an infant with low TRECs and non-SCID T lymphopenia, who proved upon whole exome sequencing to have Nijmegen breakage syndrome (NBS).
View Article and Find Full Text PDFCompressed sensing (CS) is a sparse signal sampling methodology for efficiently acquiring and reconstructing a signal from relatively few measurements. Recent work shows that CS is well-suited to be applied to problems in genomics, including probe design in microarrays, RNA interference (RNAi), and taxonomic assignment in metagenomics. The principle of using different CS recovery methods in these applications has thus been established, but a comprehensive study of using a wide range of CS methods has not been done.
View Article and Find Full Text PDFPurpose: A male infant developed generalized rash, intestinal inflammation and severe infections including persistent cytomegalovirus. Family history was negative, T cell receptor excision circles were normal, and engraftment of maternal cells was absent. No defects were found in multiple genes associated with severe combined immunodeficiency.
View Article and Find Full Text PDFBackground: Pharmacovigilance aims to uncover and understand harmful side-effects of drugs, termed adverse events (AEs). Although the current process of pharmacovigilance is very systematic, the increasing amount of information available in specialized health-related websites as well as the exponential growth in medical literature presents a unique opportunity to supplement traditional adverse event gathering mechanisms with new-age ones.
Method: We present a semi-automated pipeline to extract associations between drugs and side effects from traditional structured adverse event databases, enhanced by potential drug-adverse event pairs mined from user-comments from health-related websites and MEDLINE abstracts.