Publications by authors named "Benhur A"

Complex deep learning models trained on very large datasets have become key enabling tools for current research in natural language processing and computer vision. By providing pre-trained models that can be fine-tuned for specific applications, they enable researchers to create accurate models with minimal effort and computational resources. Large scale genomics deep learning models come in two flavors: the first are large language models of DNA sequences trained in a self-supervised fashion, similar to the corresponding natural language models; the second are supervised learning models that leverage large scale genomics datasets from ENCODE and other sources.

View Article and Find Full Text PDF

Disruptions in spatiotemporal gene expression can result in atypical brain function. Specifically, autism spectrum disorder (ASD) is characterized by abnormalities in pre-mRNA splicing. Abnormal splicing patterns have been identified in the brains of individuals with ASD, and mutations in splicing factors have been found to contribute to neurodevelopmental delays associated with ASD.

View Article and Find Full Text PDF
Article Synopsis
  • Scientists studied how tiny living things called microbes break down dead bodies in different places.
  • They found that these microbes work together in a special way to recycle the materials from the bodies, even though the climate and location can change.
  • This research could help figure out how long someone has been dead by looking at the types of microbes present.
View Article and Find Full Text PDF

Identification of the gene expression state of a cancer patient from routine pathology imaging and characterization of its phenotypic effects have significant clinical and therapeutic implications. However, prediction of expression of individual genes from whole slide images (WSIs) is challenging due to co-dependent or correlated expression of multiple genes. Here, we use a purely data-driven approach to first identify groups of genes with co-dependent expression and then predict their status from WSIs using a bespoke graph neural network.

View Article and Find Full Text PDF

The prediction of a protein 3D structure is essential for understanding protein function, drug discovery, and disease mechanisms; with the advent of methods like AlphaFold that are capable of producing very high-quality decoys, ensuring the quality of those decoys can provide further confidence in the accuracy of their predictions. In this work, we describe Q, a graph convolutional network (GCN) that utilizes a minimal set of atom and residue features as inputs to predict the global distance test total score (GDTTS) and local distance difference test (lDDT) score of a decoy. To improve the model's performance, we introduce a novel loss function based on the -insensitive loss function used for SVM regression.

View Article and Find Full Text PDF

Background: Alternative splicing is a widespread regulatory phenomenon that enables a single gene to produce multiple transcripts. Among the different types of alternative splicing, intron retention is one of the least explored despite its high prevalence in both plants and animals. The recent discovery that the majority of splicing is co-transcriptional has led to the finding that chromatin state affects alternative splicing.

View Article and Find Full Text PDF
Article Synopsis
  • This study explored the use of sonographic assessment of optic nerve sheath diameter (ONSD) as a non-invasive method for monitoring intracranial pressure (ICP) in patients undergoing elective craniotomy for intracranial tumors.
  • It aimed to measure changes in ONSD compared to pre-operative values over the first 3 postoperative days and to analyze its correlation with the Glasgow Coma Scale (GCS) and post-operative CT findings.
  • Results showed significant fluctuations in ONSD during the postoperative period, with an initial increase followed by a decrease, indicating a potential link between ONSD and patient recovery as measured by GCS.
View Article and Find Full Text PDF

As practitioners of machine learning in the area of bioinformatics we know that the quality of the results crucially depends on the quality of our labeled data. While there is a tendency to focus on the quality of positive examples, the negative examples are equally as important. In this opinion paper we revisit the problem of choosing negative examples for the task of predicting protein-protein interactions, either among proteins of a given species or for host-pathogen interactions and describe important issues that are prevalent in the current literature.

View Article and Find Full Text PDF

Motivation: Machine-learning-based prediction of compound-protein interactions (CPIs) is important for drug design, screening and repurposing. Despite numerous recent publication with increasing methodological sophistication claiming consistent improvements in predictive accuracy, we have observed a number of fundamental issues in experiment design that produce overoptimistic estimates of model performance.

Results: We systematically analyze the impact of several factors affecting generalization performance of CPI predictors that are overlooked in existing work: (i) similarity between training and test examples in cross-validation; (ii) synthesizing negative examples in absence of experimentally verified negative examples and (iii) alignment of evaluation protocol and performance metrics with real-world use of CPI predictors in screening large compound libraries.

View Article and Find Full Text PDF

Background: Despite recent progress in basecalling of Oxford nanopore DNA sequencing data, its wide adoption is still being hampered by its relatively low accuracy compared to short read technologies. Furthermore, very little of the recent research was focused on basecalling of RNA data, which has different characteristics than its DNA counterpart.

Results: We fill this gap by benchmarking a fully convolutional deep learning basecalling architecture with improved performance compared to Oxford nanopore's RNA basecallers.

View Article and Find Full Text PDF

Histone proteins compact and organize DNA resulting in a dynamic chromatin architecture impacting DNA accessibility and ultimately gene expression. Eukaryotic chromatin landscapes are structured through histone protein variants, epigenetic marks, the activities of chromatin-remodeling complexes, and post-translational modification of histone proteins. In most Archaea, histone-based chromatin structure is dominated by the helical polymerization of histone proteins wrapping DNA into a repetitive and closely gyred configuration.

View Article and Find Full Text PDF

Deep learning has demonstrated its predictive power in modeling complex biological phenomena such as gene expression. The value of these models hinges not only on their accuracy, but also on the ability to extract biologically relevant information from the trained models. While there has been much recent work on developing feature attribution methods that discover the most important features for a given sequence, inferring cooperativity between regulatory elements, which is the hallmark of phenomena such as gene expression, remains an open problem.

View Article and Find Full Text PDF

Objective: The purpose of this study is to understand the impact of the cationic polymer merquat on the rheological behavior of the mixed surfactant system of sodium lauryl ether sulfate (SLES) and cocamidopropyl betaine (CapB) as well as the impact of varying formulation conditions on the wet lubrication performance of the SLES-CapB-Merquat system.

Methods: Rotation mechanical Rheometry was used to study the rheological response of the SLES-CapB-Merquat systems. Frequency sweeps were conducted to analyze the rheological properties of the system at low frequency ranges and bulk viscosity of the system was studied at high shear rates at varying salt and polymer concentrations.

View Article and Find Full Text PDF

Increased public awareness regarding the ingredients that make up cosmetic and personal care formulations coupled with the growing concern about the dwindling nonrenewable sources from which most cosmetic ingredients like surfactants and polymers are obtained from has led to a strong need to achieve sustainability within the cosmetic industry. It has become the need of the hour to incorporate sustainability at each and every point of the product life cycle. This review focuses on the sustainable sourcing and formulation design of two key cosmetic ingredients-polymers and surfactants.

View Article and Find Full Text PDF

Next-generation sequencing (NGS) technologies - Illumina RNA-seq, Pacific Biosciences isoform sequencing (PacBio Iso-seq), and Oxford Nanopore direct RNA sequencing (DRS) - have revealed the complexity of plant transcriptomes and their regulation at the co-/post-transcriptional level. Global analysis of mature mRNAs, transcripts from nuclear run-on assays, and nascent chromatin-bound mRNAs using short as well as full-length and single-molecule DRS reads have uncovered potential roles of different forms of RNA polymerase II during the transcription process, and the extent of co-transcriptional pre-mRNA splicing and polyadenylation. These tools have also allowed mapping of transcriptome-wide start sites in cap-containing RNAs, poly(A) site choice, poly(A) tail length, and RNA base modifications.

View Article and Find Full Text PDF

Breast cancer is the second leading cause of death in women above 60 years in the US. Screening mammography is recommended for women above 50 years; however, 22% of breast cancer cases are diagnosed in women below this age. We set out to develop a test based on the detection of cell-free RNA from saliva.

View Article and Find Full Text PDF

Objective: The purpose of this study was to understand the impact of the biopolymer chitosan on the rheological behaviour of the biosurfactant sophorolipid as well as the effects of ionization and electrolyte addition on the chitosan-sophorolipid system.

Methods: Rotation mechanical rheometry was used to study the rheological response of the chitosan-SL systems. Frequency sweeps were conducted to analyse the rheological properties of the system at low-frequency ranges, and bulk viscosity of the system was studied at high shear rates for each sample.

View Article and Find Full Text PDF

Efforts to develop effective and safe drugs for treatment of tuberculosis require preclinical evaluation in animal models. Alongside efficacy testing of novel therapies, effects on pulmonary pathology and disease progression are monitored by using histopathology images from these infected animals. To compare the severity of disease across treatment cohorts, pathologists have historically assigned a semi-quantitative histopathology score that may be subjective in terms of their training, experience, and personal bias.

View Article and Find Full Text PDF

Drought is a major limiting factor of crop yields. In response to drought, plants reprogram their gene expression, which ultimately regulates a multitude of biochemical and physiological processes. The timing of this reprogramming and the nature of the drought-regulated genes in different genotypes are thought to confer differential tolerance to drought stress.

View Article and Find Full Text PDF

Background: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.

Results: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes.

View Article and Find Full Text PDF

Motivation: Deep learning architectures have recently demonstrated their power in predicting DNA- and RNA-binding specificity. Existing methods fall into three classes: Some are based on convolutional neural networks (CNNs), others use recurrent neural networks (RNNs) and others rely on hybrid architectures combining CNNs and RNNs. However, based on existing studies the relative merit of the various architectures remains unclear.

View Article and Find Full Text PDF

Uniqprimer, a software pipeline developed in Python, was deployed as a user-friendly internet tool in Rice Galaxy for comparative genome analyses to design primer sets for PCRassays capable of detecting target bacterial taxa. The pipeline was trialed with , a destructive broad-host-range bacterial pathogen found in most potato-growing regions. is a highly variable genus, and some primers available to detect this genus and species exhibit common diagnostic failures.

View Article and Find Full Text PDF

Background: Determining protein-protein interactions and their binding affinity are important in understanding cellular biological processes, discovery and design of novel therapeutics, protein engineering, and mutagenesis studies. Due to the time and effort required in wet lab experiments, computational prediction of binding affinity from sequence or structure is an important area of research. Structure-based methods, though more accurate than sequence-based techniques, are limited in their applicability due to limited availability of protein structure data.

View Article and Find Full Text PDF

Abiotic stresses affect plant physiology, development, growth, and alter pre-mRNA splicing. Western poplar is a model woody tree and a potential bioenergy feedstock. To investigate the extent of stress-regulated alternative splicing (AS), we conducted an in-depth survey of leaf, root, and stem xylem transcriptomes under drought, salt, or temperature stress.

View Article and Find Full Text PDF
Article Synopsis
  • Intron retention (IR) is a key alternative splicing mechanism in plants that enhances gene diversity and is influenced by chromatin structure and transcription speed.
  • The study utilizes DNase I-seq data from Arabidopsis and rice to show that IR events are enriched in DNase I Hypersensitive Sites (DHSs), indicating that retained introns have more open chromatin facilitating faster transcription.
  • The research also identifies DNA-binding protein footprints, suggesting that these proteins may play a role in regulating chromatin structure and consequently the occurrence of IR.
View Article and Find Full Text PDF