The majority of chemicals detected via nontarget liquid chromatography high-resolution mass spectrometry (HRMS) in environmental samples remain unidentified, challenging the capability of existing machine learning models to pinpoint potential endocrine disruptors (EDs). Here, we predict the activity of unidentified chemicals across 12 bioassays related to EDs within the Tox21 10K dataset. Single- and multi-output models, utilizing various machine learning algorithms and molecular fingerprint features as an input, were trained for this purpose.
View Article and Find Full Text PDFSPHERE is a large multidisciplinary project to research and develop a sensor network to facilitate home healthcare by activity monitoring, specifically towards activities of daily living. It aims to use the latest technologies in low powered sensors, internet of things, machine learning and automated decision making to provide benefits to patients and clinicians. This dataset comprises data collected from a SPHERE sensor network deployment during a set of experiments conducted in the 'SPHERE House' in Bristol, UK, during 2016, including video tracking, accelerometer and environmental sensor data obtained by volunteers undertaking both scripted and non-scripted activities of daily living in a domestic residence.
View Article and Find Full Text PDFNon-targeted screening with LC/ESI/HRMS aims to identify the structure of the detected compounds using their retention time, exact mass, and fragmentation pattern. Challenges remain in differentiating between isomeric compounds. One untapped possibility to facilitate identification of isomers relies on different ionic species formed in electrospray.
View Article and Find Full Text PDFBrachyury(+) mesodermal cell population with purity over 79% was obtained from differentiating brachyury embryonic stem cells (ESC) generated with brachyury promoter driven enhanced green fluorescent protein and puromycin-N-acetyltransferase. A comprehensive transcriptomic analysis of brachyury(+) cells enriched with puromycin application from 6-day-old embryoid bodies (EBs), 6-day-old control EBs and undifferentiated ESCs led to identification of 1573 uniquely up-regulated and 1549 uniquely down-regulated transcripts in brachyury(+) cells. Furthermore, transcripts up-regulated in brachyury(+) cells have overrepresented the Gene Ontology annotations (cell differentiation, blood vessel morphogenesis, striated muscle development, placenta development and cell motility) and Kyoto Encyclopedia of Genes and Genomes pathway annotations (mitogen-activated protein kinase signaling and transforming growth factor beta signaling).
View Article and Find Full Text PDFWe present a web resource MEM (Multi-Experiment Matrix) for gene expression similarity searches across many datasets. MEM features large collections of microarray datasets and utilizes rank aggregation to merge information from different datasets into a single global ordering with simultaneous statistical significance estimation. Unique features of MEM include automatic detection, characterization and visualization of datasets that includes the strongest coexpression patterns.
View Article and Find Full Text PDFMeasuring gene expression levels with microarrays is one of the key technologies of modern genomics. Clustering of microarray data is an important application, as genes with similar expression profiles may be regulated by common pathways and involved in related functions. Gene Ontology (GO) analysis and visualization allows researchers to study the biological context of discovered clusters and characterize genes with previously unknown functions.
View Article and Find Full Text PDFThe Alternative Splicing and Transcript Diversity database (ASTD) gives access to a vast collection of alternative transcripts that integrate transcription initiation, polyadenylation and splicing variant data. Alternative transcripts are derived from the mapping of transcribed sequences to the complete human, mouse and rat genomes using an extension of the computational pipeline developed for the ASD (Alternative Splicing Database) and ATD (Alternative Transcript Diversity) databases, which are now superseded by ASTD. For the human genome, ASTD identifies splicing variants, transcription initiation variants and polyadenylation variants in 68%, 68% and 62% of the gene set, respectively, consistent with current estimates for transcription variation.
View Article and Find Full Text PDFBackground: Agglomerative hierarchical clustering (AHC) is a common unsupervised data analysis technique used in several biological applications. Standard AHC methods require that all pairwise distances between data objects must be known. With ever-increasing data sizes this quadratic complexity poses problems that cannot be overcome by simply waiting for faster computers.
View Article and Find Full Text PDF