Publications by authors named "Bejerano G"

Fins are major functional appendages of fish that have been repeatedly modified in different lineages. To search for genomic changes underlying natural fin diversity, we compared the genomes of 36 percomorph fish species that span over 100 million years of evolution and either have complete or reduced pelvic and caudal fins. We identify 1,614 genomic regions that are well-conserved in fin-complete species but missing from multiple fin-reduced lineages.

View Article and Find Full Text PDF

Background: 'Long read' sequencing methods have been used to identify previously uncharacterized structural variants that cause human genetic diseases. Therefore, we investigated whether long read sequencing could facilitate genetic analysis of murine models for human diseases.

Results: The genomes of six inbred strains (BTBR T + Itpr3tf/J, 129Sv1/J, C57BL/6/J, Balb/c/J, A/J, SJL/J) were analyzed using long read sequencing.

View Article and Find Full Text PDF

Fins are major functional appendages of fish that have been repeatedly modified in different lineages. To search for genomic changes underlying natural fin diversity, we compared the genomes of 36 wild fish species that either have complete or reduced pelvic and caudal fins. We identify 1,614 genomic regions that are well-conserved in fin-complete species but missing from multiple fin-reduced lineages.

View Article and Find Full Text PDF

We present WhichTF, a computational method to identify functionally important transcription factors (TFs) from chromatin accessibility measurements. To rank TFs, WhichTF applies an ontology-guided functional approach to compute novel enrichment by integrating accessibility measurements, high-confidence pre-computed conservation-aware TF binding sites, and putative gene-regulatory models. Comparison with prior sheer abundance-based methods reveals the unique ability of WhichTF to identify context-specific TFs with functional relevance, including NF-κB family members in lymphocytes and GATA factors in cardiac cells.

View Article and Find Full Text PDF

Purpose: Cohort building is a powerful foundation for improving clinical care, performing biomedical research, recruiting for clinical trials, and many other applications. We set out to build a cohort of all monogenic patients with a definitive causal gene diagnosis in a 3-million patient hospital system.

Methods: We define a subset (4461) of OMIM diseases that have at least 1 known monogenic causal gene.

View Article and Find Full Text PDF

Stopgain substitutions are the third-largest class of monogenic human disease mutations and often examined first in patient exomes. Existing computational stopgain pathogenicity predictors, however, exhibit poor performance at the high sensitivity required for clinical use. Here, we introduce a new classifier, termed X-CAP, which uses a novel training methodology and unique feature set to improve the AUROC by 18% and decrease the false-positive rate 4-fold on large variant databases.

View Article and Find Full Text PDF

We present Champagne, a whole-genome method for generating character matrices for phylogenomic analysis using large genomic indel events. By rigorously picking orthologous genes and locating large insertion and deletion events, Champagne delivers a character matrix that considerably reduces homoplasy compared with morphological and nucleotide-based matrices, on both established phylogenies and difficult-to-resolve nodes in the mammalian tree. Champagne provides ample evidence in the form of genomic structural variation to support incomplete lineage sorting and possible introgression in Paenungulata and human-chimp-gorilla which were previously inferred primarily through matrices composed of aligned single-nucleotide characters.

View Article and Find Full Text PDF

The ability to use genome-wide association studies (GWAS) for genetic discovery depends upon our ability to distinguish true causative from false positive association signals. Population structure (PS) has been shown to cause false positive signals in GWAS. PS correction is routinely used for analysis of human GWAS results, and it has been assumed that it also should be utilized for murine GWAS using inbred strains.

View Article and Find Full Text PDF
Article Synopsis
  • About 70% of patients with suspected Mendelian diseases remain undiagnosed even after genome sequencing due to incomplete knowledge about pathogenic genes, and generating new gene hypotheses can be slow without cohort analysis.
  • InpherNet is a new machine learning tool that uses network data from the Monarch Initiative to efficiently rank candidate genes based on various biological connections and evidence.
  • InpherNet outperforms existing gene ranking methods by correctly identifying causative genes more often, even in cases with little prior clinical evidence, thus enhancing the diagnosis of monogenic diseases.
View Article and Find Full Text PDF

DNA profiling has become an essential tool for crime solving and prevention, and CODIS (Combined DNA Index System) criminal investigation databases have flourished at the national, state and even local level. However, reports suggest that the DNA profiles of all suspects searched in these databases are often retained, which could result in racial profiling. Here, we devise an approach to both enable broad DNA profile searches and preserve exonerated citizens' privacy through a real-time privacy-preserving procedure to query CODIS databases.

View Article and Find Full Text PDF

Sense organs acquire their distinctive shapes concomitantly with the differentiation of sensory cells and neurons necessary for their function. Although our understanding of the mechanisms controlling morphogenesis and neurogenesis in these structures has grown, how these processes are coordinated remains largely unexplored. Neurogenesis in the zebrafish olfactory epithelium requires the bHLH proneural transcription factor Neurogenin 1 (Neurog1).

View Article and Find Full Text PDF

We are only just beginning to catalog the vast diversity of cell types in the cerebral cortex. Such categorization is a first step toward understanding how diversification relates to function. All cortical projection neurons arise from a uniform pool of progenitor cells that lines the ventricles of the forebrain.

View Article and Find Full Text PDF

Gene losses provide an insightful route for studying the morphological and physiological adaptations of species, but their discovery is challenging. Existing genome annotation tools focus on annotating intact genes and do not attempt to distinguish nonfunctional genes from genes missing annotation due to sequencing and assembly artifacts. Previous attempts to annotate gene losses have required significant manual curation, which hampers their scalability for the ever-increasing deluge of newly sequenced genomes.

View Article and Find Full Text PDF

The diagnosis of Mendelian disorders requires labor-intensive literature research. Trained clinicians can spend hours looking for the right publication(s) supporting a single gene that best explains a patient's disease. AMELIE (Automatic Mendelian Literature Evaluation) greatly accelerates this process.

View Article and Find Full Text PDF

Distantly related species entering similar biological niches often adapt by evolving similar morphological and physiological characters. How much genomic molecular convergence (particularly of highly constrained coding sequence) contributes to convergent phenotypic evolution, such as echolocation in bats and whales, is a long-standing fundamental question. Like others, we find that convergent amino acid substitutions are not more abundant in echolocating mammals compared to their outgroups.

View Article and Find Full Text PDF

Objective: To investigate the characteristics and risk factors of a novel parenchymal lung disease (LD), increasingly detected in systemic juvenile idiopathic arthritis (sJIA).

Methods: In a multicentre retrospective study, 61 cases were investigated using physician-reported clinical information and centralised analyses of radiological, pathological and genetic data.

Results: LD was associated with distinctive features, including acute erythematous clubbing and a high frequency of anaphylactic reactions to the interleukin (IL)-6 inhibitor, tocilizumab.

View Article and Find Full Text PDF

Population-based biobanks with genomic and dense phenotype data provide opportunities for generating effective therapeutic hypotheses and understanding the genomic role in disease predisposition. To characterize latent components of genetic associations, we apply truncated singular value decomposition (DeGAs) to matrices of summary statistics derived from genome-wide association analyses across 2,138 phenotypes measured in 337,199 White British individuals in the UK Biobank study. We systematically identify key components of genetic associations and the contributions of variants, genes, and phenotypes to each component.

View Article and Find Full Text PDF

Purpose: Both monogenic pathogenic variant cataloging and clinical patient diagnosis start with variant-level evidence retrieval followed by expert evidence integration in search of diagnostic variants and genes. Here, we try to accelerate pathogenic variant evidence retrieval by an automatic approach.

Methods: Automatic VAriant evidence DAtabase (AVADA) is a novel machine learning tool that uses natural language processing to automatically identify pathogenic genetic variant evidence in full-text primary literature about monogenic disease and convert it to genomic coordinates.

View Article and Find Full Text PDF

It is estimated that 350 million individuals worldwide suffer from rare diseases, which are predominantly caused by mutation in a single gene. The current molecular diagnostic rate is estimated at 50%, with whole-exome sequencing (WES) among the most successful approaches. For patients in whom WES is uninformative, RNA sequencing (RNA-seq) has shown diagnostic utility in specific tissues and diseases.

View Article and Find Full Text PDF

Human neural stem cells (NSCs) offer therapeutic potential for neurodegenerative diseases, such as inherited monogenic nervous system disorders, and neural injuries. Gene editing in NSCs (GE-NSCs) could enhance their therapeutic potential. We show that NSCs are amenable to gene targeting at multiple loci using Cas9 mRNA with synthetic chemically modified guide RNAs along with DNA donor templates.

View Article and Find Full Text PDF

Exome analysis of patients with a likely monogenic disease does not identify a causal variant in over half of cases. Splice-disrupting mutations make up the second largest class of known disease-causing mutations. Each individual (singleton) exome harbors over 500 rare variants of unknown significance (VUS) in the splicing region.

View Article and Find Full Text PDF

Purpose: Diagnosing monogenic diseases facilitates optimal care, but can involve the manual evaluation of hundreds of genetic variants per case. Computational tools like Phrank expedite this process by ranking all candidate genes by their ability to explain the patient's phenotypes. To use these tools, busy clinicians must manually encode patient phenotypes from lengthy clinical notes.

View Article and Find Full Text PDF

Experimental detection of RNA splicing branchpoints is difficult. To date, high-confidence experimental annotations exist for 18% of 3' splice sites in the human genome. We develop a deep-learning-based branchpoint predictor, LaBranchoR, which predicts a correct branchpoint for at least 75% of 3' splice sites genome-wide.

View Article and Find Full Text PDF

Genetic variation in cis-regulatory elements is thought to be a major driving force in morphological and physiological changes. However, identifying transcription factor binding events that code for complex traits remains a challenge, motivating novel means of detecting putatively important binding events. Using a curated set of 1154 high-quality transcription factor motifs, we demonstrate that independently eroded binding sites are enriched for independently lost traits in three distinct pairs of placental mammals.

View Article and Find Full Text PDF
Article Synopsis
  • - The human genome contains about 2% of protein-coding genes, but most disease-causing variants are found in exons or splice sites, making it essential to look beyond coding regions for causes of certain disorders.
  • - A case study of an Afghan male with Wilson Disease showed he had no harmful variants in the known causative gene ATP7B, yet examination revealed a variant in the ATP7B promoter that disrupts a site crucial for gene regulation.
  • - This discovery highlights the importance of investigating non-coding genetic variants and suggests that similar methods could be used to identify other non-coding variants linked to diseases. !*
View Article and Find Full Text PDF