Introduction: Pharmacovigilance is vital for drug safety. The process typically involves two key steps: initial signal generation from spontaneous reporting systems (SRSs) and subsequent expert review to assess the signals' (potential) causality and decide on the appropriate action.
Methods: We propose a novel discovery and verification approach to pharmacovigilance based on electronic healthcare data.
This study aimed to develop convolutional neural networks (CNNs) models to predict the energy expenditure (EE) of children from raw accelerometer data. Additionally, this study sought to external validation of the CNN models in addition to the linear regression (LM), random forest (RF), and full connected neural network (FcNN) models published in Steenbock(201994-102).Included in this study were 41 German children (3.
View Article and Find Full Text PDFMyosin-7a is an actin-based motor protein vital for auditory and visual processes. Mutations in myosin-7a lead to Usher syndrome type 1, the most common and severe form of deaf-blindness in humans. It is hypothesized that myosin-7a forms a transmembrane adhesion complex with other Usher proteins, essential for the structural-functional integrity of photoreceptor and cochlear hair cells.
View Article and Find Full Text PDFRandom survival forests (RSF) can be applied to many time-to-event research questions and are particularly useful in situations where the relationship between the independent variables and the event of interest is rather complex. However, in many clinical settings, the occurrence of the event of interest is affected by competing events, which means that a patient can experience an outcome other than the event of interest. Neglecting the competing event (i.
View Article and Find Full Text PDFSummary: Due to their flexibility and superior performance, machine learning models frequently complement and outperform traditional statistical survival models. However, their widespread adoption is hindered by a lack of user-friendly tools to explain their internal operations and prediction rationales. To tackle this issue, we introduce the survex R package, which provides a cohesive framework for explaining any survival model by applying explainable artificial intelligence techniques.
View Article and Find Full Text PDFAims/hypothesis: There is increasing evidence for the existence of shared genetic predictors of metabolic traits and neurodegenerative disease. We previously observed a U-shaped association between fasting insulin in middle-aged women and dementia up to 34 years later. In the present study, we performed genome-wide association (GWA) analyses for fasting serum insulin in European children with a focus on variants associated with the tails of the insulin distribution.
View Article and Find Full Text PDFTrees continuously regulate leaf physiology to acquire CO while simultaneously avoiding excessive water loss. The balance between these two processes, or water use efficiency (WUE), is fundamentally important to understanding changes in carbon uptake and transpiration from the leaf to the globe under environmental change. While increasing atmospheric CO (iCO ) is known to increase tree intrinsic water use efficiency (iWUE), less clear are the additional impacts of climate and acidic air pollution and how they vary by tree species.
View Article and Find Full Text PDFThis article describes a data-driven framework based on spatiotemporal machine learning to produce distribution maps for 16 tree species ( Mill., Mill., L.
View Article and Find Full Text PDFInspired by a previous experimental study of fish swimming near a cylinder, we numerically investigate the swimming and station-holding behavior of a flexible plate ahead of a circular cylinder whose motion is controlled by a proportional-derivative (PD) controller. Specifically, the deformation of this two-dimensional plate is actuated by a periodically varying external force applied on the body surface, which mimics the fish muscle force to produce propulsive thrust. The actuation force amplitude is dynamically adjusted by a feedback controller to instruct the plate to swim the desired distance from an initial position to a target location and then hold the station there.
View Article and Find Full Text PDFThe Translational Machine (TM) is a machine learning (ML)-based analytic pipeline that translates genotypic/variant call data into biologically contextualized features that richly characterize complex variant architectures and permit greater interpretability and biological replication. It also reduces potentially confounding effects of population substructure on outcome prediction. The TM consists of three main components.
View Article and Find Full Text PDFBackground: Childhood obesity is a complex multifaceted condition, which is influenced by genetics, environmental factors, and their interaction. However, these interactions have mainly been studied in twin studies and evidence from population-based cohorts is limited. Here, we analyze the interaction of an obesity-related genome-wide polygenic risk score (PRS) with sociodemographic and lifestyle factors for BMI and waist circumference (WC) in European children and adolescents.
View Article and Find Full Text PDFBackground: To evaluate a multicomponent health promotion program targeting preschoolers' physical activity (PA).
Methods: PA of children from 23 German daycare facilities (DFs; 13 intervention DFs, 10 control DFs) was measured via accelerometry at baseline and after 12 months. Children's sedentary time, light PA, and moderate to vigorous PA were estimated.
Random forests have become an established tool for classification and regression, in particular in high-dimensional settings and in the presence of non-additive predictor-response relationships. For bounded outcome variables restricted to the unit interval, however, classical modeling approaches based on mean squared error loss may severely suffer as they do not account for heteroscedasticity in the data. To address this issue, we propose a random forest approach for relating a beta dis-tributed outcome to a set of explanatory variables.
View Article and Find Full Text PDFBMC Bioinformatics
June 2019
Background: In the last years more and more multi-omics data are becoming available, that is, data featuring measurements of several types of omics data for each patient. Using multi-omics data as covariate data in outcome prediction is both promising and challenging due to the complex structure of such data. Random forest is a prediction method known for its ability to render complex dependency patterns between the outcome and the covariates.
View Article and Find Full Text PDFIn this paper, we give an overview of methodological issues related to the use of statistical learning approaches when analyzing high-dimensional genetic data. The focus is set on regression models and machine learning algorithms taking genetic variables as input and returning a classification or a prediction for the target variable of interest; for example, the present or future disease status, or the future course of a disease. After briefly explaining the basic motivation and principle of these methods, we review different procedures that can be used to evaluate the accuracy of the obtained models and discuss common flaws that may lead to over-optimistic conclusions with respect to their prediction performance and usefulness.
View Article and Find Full Text PDFOne reason for the widespread success of random forests (RFs) is their ability to analyze most datasets without preprocessing. For example, in contrast to many other statistical methods and machine learning approaches, no recoding such as dummy coding is required to handle ordinal and nominal predictors. The standard approach for nominal predictors is to consider all 2 - 1 2-partitions of the predictor categories.
View Article and Find Full Text PDFRandom forest and similar Machine Learning techniques are already used to generate spatial predictions, but spatial location of points (geography) is often ignored in the modeling process. Spatial auto-correlation, especially if still existent in the cross-validation residuals, indicates that the predictions are maybe biased, and this is suboptimal. This paper presents a random forest for spatial predictions framework (RFsp) where buffer distances from observation points are used as explanatory variables, thus incorporating geographical proximity effects into the prediction process.
View Article and Find Full Text PDFBundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz
September 2018
Adverse drug reactions are among the leading causes of death. Pharmacovigilance aims to monitor drugs after they have been released to the market in order to detect potential risks. Data sources commonly used to this end are spontaneous reports sent in by doctors or pharmaceutical companies.
View Article and Find Full Text PDFMotivation: Random forests are fast, flexible and represent a robust approach to analyze high dimensional data. A key advantage over alternative machine learning algorithms are variable importance measures, which can be used to identify relevant features or perform variable selection. Measures based on the impurity reduction of splits, such as the Gini importance, are popular because they are simple and fast to compute.
View Article and Find Full Text PDFMutations in mitochondrial DNA (mtDNA) lead to heteroplasmy, i.e., the intracellular coexistence of wild-type and mutant mtDNA strands, which impact a wide spectrum of diseases but also physiological processes, including endurance exercise performance in athletes.
View Article and Find Full Text PDFThe advancement of high-throughput sequencing technologies enables sequencing of human genomes at steadily decreasing costs and increasing quality. Before variants can be analyzed, e.g.
View Article and Find Full Text PDFThis paper describes the technical development and accuracy assessment of the most recent and improved version of the SoilGrids system at 250m resolution (June 2016 update). SoilGrids provides global predictions for standard numeric soil properties (organic carbon, bulk density, Cation Exchange Capacity (CEC), pH, soil texture fractions and coarse fragments) at seven standard depths (0, 5, 15, 30, 60, 100 and 200 cm), in addition to predictions of depth to bedrock and distribution of soil classes based on the World Reference Base (WRB) and USDA classification systems (ca. 280 raster layers in total).
View Article and Find Full Text PDFThe most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes.
View Article and Find Full Text PDF