The selection of biomarker panels in omics data, challenged by numerous molecular features and limited samples, often requires the use of machine learning methods paired with wrapper feature selection techniques, like genetic algorithms. They test various feature sets-potential biomarker solutions-to fine-tune a machine learning model's performance for supervised tasks, such as classifying cancer subtypes. This optimization process is undertaken using validation sets to evaluate and identify the most effective feature combinations.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
October 2024
Machine learning algorithms have been extensively used for accurate classification of cancer subtypes driven by gene expression-based biomarkers. However, biomarker models combining multiple gene expression signatures are often not reproducible in external validation datasets and their feature set size is often not optimized, jeopardizing their translatability into cost-effective clinical tools. We investigated how to solve the multi-objective problem of finding the best trade-offs between classification performance and set size applying seven algorithms for machine learning-driven feature subset selection and analyse how they perform in a benchmark with eight large-scale transcriptome datasets of cancer, covering both training and external validation sets.
View Article and Find Full Text PDFJ Biomed Inform
November 2024
The proliferation of omics data has advanced cancer biomarker discovery but often falls short in external validation, mainly due to a narrow focus on prediction accuracy that neglects clinical utility and validation feasibility. We introduce three- and four-objective optimization strategies based on genetic algorithms to identify clinically actionable biomarkers in omics studies, addressing classification tasks aimed at distinguishing hard-to-differentiate cancer subtypes beyond histological analysis alone. Our hypothesis is that by optimizing more than one characteristic of cancer biomarkers, we may identify biomarkers that will enhance their success in external validation.
View Article and Find Full Text PDFMotivation: Cancer is a very heterogeneous disease that can be difficult to treat without addressing the specific mechanisms driving tumour progression in a given patient. High-throughput screening and sequencing data from cancer cell-lines has driven many developments in drug development, however, there are important aspects crucial to precision medicine that are often overlooked, namely the inherent differences between tumours in patients and the cell-lines used to model them in vitro. Recent developments in transfer learning methods for patient and cell-line data have shown progress in translating results from cell-lines to individual patients in silico.
View Article and Find Full Text PDFRecent research on multi-view clustering algorithms for complex disease subtyping often overlooks aspects like clustering stability and critical assessment of prognostic relevance. Furthermore, current frameworks do not allow for a comparison between data-driven and pathway-driven clustering, highlighting a significant gap in the methodology. We present the COPS R-package, tailored for robust evaluation of single and multi-omics clustering results.
View Article and Find Full Text PDFHyaluronan (HA) accumulation in clear cell renal cell carcinoma (ccRCC) is associated with poor prognosis; however, its biology and role in tumorigenesis are unknown. RNA sequencing of 48 HA-positive and 48 HA-negative formalin-fixed paraffin-embedded (FFPE) samples was performed to identify differentially expressed genes (DEG). The DEGs were subjected to pathway and gene enrichment analyses.
View Article and Find Full Text PDFMachine learning (ML) methods are increasingly becoming crucial in genome-wide association studies for identifying key genetic variants or SNPs that statistical methods might overlook. Statistical methods predominantly identify SNPs with notable effect sizes by conducting association tests on individual genetic variants, one at a time, to determine their relationship with the target phenotype. These genetic variants are then used to create polygenic risk scores (PRSs), estimating an individual's genetic risk for complex diseases like cancer or cardiovascular disorders.
View Article and Find Full Text PDFAims: Vascular smooth muscle cells (SMCs) and their derivatives are key contributors to the development of atherosclerosis. However, studying changes in SMC gene expression in heterogeneous vascular tissues is challenging due to the technical limitations and high cost associated with current approaches. In this paper, we apply translating ribosome affinity purification sequencing to profile SMC-specific gene expression directly from tissue.
View Article and Find Full Text PDFBackground: Atopic dermatitis (AD) is a prevalent chronic inflammatory skin disease whose pathophysiology involves the interplay between genetic and environmental factors, ultimately leading to dysfunction of the epidermis. While several treatments are effective in symptom management, many existing therapies offer only temporary relief and often come with side effects. For this reason, the formulation of an effective therapeutic plan is challenging and there is a need for more effective and targeted treatments that address the root causes of the condition.
View Article and Find Full Text PDFThe growing nanoparticulate pollution (e.g. engineered nanoparticles (NPs) or nanoplastics) has been shown to pose potential threats to human health.
View Article and Find Full Text PDFBioinform Adv
October 2022
Motivation: Gene expression-based classifiers are often developed using historical data by training a model on a small set of patients and a large set of features. Models trained in such a way can be afterwards applied for predicting the output for new unseen patient data. However, very often the accuracy of these models starts to decrease as soon as new data is fed into the trained model.
View Article and Find Full Text PDFIn recent years, a growing interest in the characterization of the molecular basis of psoriasis has been observed. However, despite the availability of a large amount of molecular data, many pathogenic mechanisms of psoriasis are still poorly understood. In this study, we performed an integrated analysis of 23 public transcriptomic datasets encompassing both lesional and uninvolved skin samples from psoriasis patients.
View Article and Find Full Text PDFMotivation: In modern translational research, the development of biomarkers heavily relies on use of omics technologies, but implementations with basic data mining algorithms frequently lead to false positives. Non-dominated Sorting Genetic Algorithm II (NSGA2) is an extremely effective algorithm for biomarker discovery but has been rarely evaluated against large-scale datasets. The exploration of the feature search space is the key to NSGA2 success but in specific cases NSGA2 expresses a shallow exploration of the space of possible feature combinations, possibly leading to models with low predictive performances.
View Article and Find Full Text PDFThere is an urgent need to apply effective, data-driven approaches to reliably predict engineered nanomaterial (ENM) toxicity. Here we introduce a predictive computational framework based on the molecular and phenotypic effects of a large panel of ENMs across multiple in vitro and in vivo models. Our methodology allows for the grouping of ENMs based on multi-omics approaches combined with robust toxicity tests.
View Article and Find Full Text PDFThe molecular effects of exposures to engineered nanomaterials (ENMs) are still largely unknown. In classical inhalation toxicology, cell composition of bronchoalveolar lavage (BAL) is a toxicity indicator at the lung tissue level that can aid in interpreting pulmonary histological changes. Toxicogenomic approaches help characterize the mechanism of action (MOA) of ENMs by investigating the differentially expressed genes (DEG).
View Article and Find Full Text PDFThe network approach is quickly becoming a fundamental building block of computational methods aiming at elucidating the mechanism of action (MoA) and therapeutic effect of drugs. By modeling the effect of drugs and diseases on different biological networks, it is possible to better explain the interplay between disease perturbations and drug targets as well as how drug compounds induce favorable biological responses and/or adverse effects. Omics technologies have been extensively used to generate the data needed to study the mechanisms of action of drugs and diseases.
View Article and Find Full Text PDFDespite remarkable efforts of computational and predictive pharmacology to improve therapeutic strategies for complex diseases, only in a few cases have the predictions been eventually employed in the clinics. One of the reasons behind this drawback is that current predictive approaches are based only on the integration of molecular perturbation of a certain disease with drug sensitivity signatures, neglecting intrinsic properties of the drugs. Here we integrate mechanistic and chemocentric approaches to drug repositioning by developing an innovative network pharmacology strategy.
View Article and Find Full Text PDFComput Struct Biotechnol J
March 2022
The recent advancements in toxicogenomics have led to the availability of large omics data sets, representing the starting point for studying the exposure mechanism of action and identifying candidate biomarkers for toxicity prediction. The current lack of standard methods in data generation and analysis hampers the full exploitation of toxicogenomics-based evidence in regulatory risk assessment. Moreover, the pipelines for the preprocessing and downstream analyses of toxicogenomic data sets can be quite challenging to implement.
View Article and Find Full Text PDFPurpose: Endocrine disruptors are a rising concern due to the wide array of health issues that it can cause. Although there are tools for mode of action (MoA)-based prediction of endocrine disruption (e.g.
View Article and Find Full Text PDFBiomarkers are valuable indicators of the state of a biological system. Microarray technology has been extensively used to identify biomarkers and build computational predictive models for disease prognosis, drug sensitivity and toxicity evaluations. Activation biomarkers can be used to understand the underlying signaling cascades, mechanisms of action and biological cross talk.
View Article and Find Full Text PDFTypical clustering analysis for large-scale genomics data combines two unsupervised learning techniques: dimensionality reduction and clustering (DR-CL) methods. It has been demonstrated that transforming gene expression to pathway-level information can improve the robustness and interpretability of disease grouping results. This approach, referred to as biological knowledge-driven clustering (BK-CL) approach, is often neglected, due to a lack of tools enabling systematic comparisons with more established DR-based methods.
View Article and Find Full Text PDFEndocrine disrupting compounds (EDCs) are a persistent threat to humans and wildlife due to their ability to interfere with endocrine signaling pathways. Inspired by previous work to improve chemical hazard identification through the use of toxicogenomics data, we developed a genomic-oriented data space for profiling the molecular activity of EDCs in an in silico manner, and for creating predictive models that identify and prioritize EDCs. Predictive models of EDCs, derived from gene expression data from rats (in vivo and in vitro primary hepatocytes) and humans (in vitro primary hepatocytes and HepG2), achieve testing accuracy greater than 90%.
View Article and Find Full Text PDFContact dermatitis tremendously impacts the quality of life of suffering patients. Currently, diagnostic regimes rely on allergy testing, exposure specification, and follow-up visits; however, distinguishing the clinical phenotype of irritant and allergic contact dermatitis remains challenging. Employing integrative transcriptomic analysis and machine-learning approaches, we aimed to decipher disease-related signature genes to find suitable sets of biomarkers.
View Article and Find Full Text PDFDespite considerable efforts, the properties that drive the cytotoxicity of engineered nanomaterials (ENMs) remain poorly understood. Here, the authors inverstigate a panel of 31 ENMs with different core chemistries and a variety of surface modifications using conventional in vitro assays coupled with omics-based approaches. Cytotoxicity screening and multiplex-based cytokine profiling reveals a good concordance between primary human monocyte-derived macrophages and the human monocyte-like cell line THP-1.
View Article and Find Full Text PDFBackground: After the Second World War, the population living in the Karelian region was strictly divided by the "iron curtain" between Finland and Russia. This resulted in different lifestyle, standard of living, and exposure to the environment. Allergic manifestations and sensitization to common allergens have been much more common on the Finnish compared to the Russian side.
View Article and Find Full Text PDF