The reproducibility of experiments has been a long standing impediment for further scientific progress. Computational methods have been instrumental in drug discovery efforts owing to its multifaceted utilization for data collection, pre-processing, analysis and inference. This article provides an in-depth coverage on the reproducibility of computational drug discovery.
View Article and Find Full Text PDFThe increasing complexity of data and analysis methods has created an environment where scientists, who may not have formal training, are finding themselves playing the impromptu role of software engineer. While several resources are available for introducing scientists to the basics of programming, researchers have been left with little guidance on approaches needed to advance to the next level for the development of robust, large-scale data analysis tools that are amenable to integration into workflow management systems, tools, and frameworks. The integration into such workflow systems necessitates additional requirements on computational tools, such as adherence to standard conventions for robustness, data input, output, logging, and flow control.
View Article and Find Full Text PDFBackground: The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation, and aid reproducibility of analyses. Many contemporary workflow tools are specialized or not designed for highly complex workflows, such as with nested loops, dynamic scheduling, and parametrization, which is common in, e.
View Article and Find Full Text PDFBackground: Metabolomics is the comprehensive study of a multitude of small molecules to gain insight into an organism's metabolism. The research field is dynamic and expanding with applications across biomedical, biotechnological, and many other applied biological domains. Its computationally intensive nature has driven requirements for open data formats, data repositories, and data analysis tools.
View Article and Find Full Text PDFLigand-based models can be used in drug discovery to obtain an early indication of potential off-target interactions that could be linked to adverse effects. Another application is to combine such models into a panel, allowing to compare and search for compounds with similar profiles. Most contemporary methods and implementations however lack valid measures of confidence in their predictions, and only provide point predictions.
View Article and Find Full Text PDFLipophilicity is a major determinant of ADMET properties and overall suitability of drug candidates. We have developed large-scale models to predict water-octanol distribution coefficient (logD) for chemical compounds, aiding drug discovery projects. Using ACD/logD data for 1.
View Article and Find Full Text PDFJ Biomed Semantics
September 2017
Background: Biological sciences are characterised not only by an increasing amount but also the extreme complexity of its data. This stresses the need for efficient ways of integrating these data in a coherent description of biological systems. In many cases, biological data needs organization before integration.
View Article and Find Full Text PDFHere we describe the SweGen data set, a comprehensive map of genetic variation in the Swedish population. These data represent a basic resource for clinical genetics laboratories as well as for sequencing-based association studies by providing information on genetic variant frequencies in a cohort that is well matched to national patient cohorts. To select samples for this study, we first examined the genetic structure of the Swedish population using high-density SNP-array data from a nation-wide cohort of over 10 000 Swedish-born individuals included in the Swedish Twin Registry.
View Article and Find Full Text PDFPredictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud computing are required, adding to the existing challenges of fault-tolerant automation. Workflow management systems can aid in many of these challenges, but the currently available systems are lacking in the functionality needed to enable agile and flexible predictive modelling.
View Article and Find Full Text PDFThe increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures.
View Article and Find Full Text PDFQuantifying population status is a key objective in many ecological studies, but is often difficult to achieve for cryptic or elusive species. Here, non-invasive genetic capture-mark-recapture (CMR) methods have become a very important tool to estimate population parameters, such as population size and sex ratio. The Eurasian otter (Lutra lutra) is such an elusive species of management concern and is increasingly studied using faecal-based genetic sampling.
View Article and Find Full Text PDF: Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. Here we describe UPPNEX, an implementation of such an infrastructure, tailored to the needs of data storage and analysis of NGS data in Sweden serving various labs and multiple instruments from the major sequencing technology platforms.
View Article and Find Full Text PDFBackground: Semantic web technologies are finding their way into the life sciences. Ontologies and semantic markup have already been used for more than a decade in molecular sciences, but have not found widespread use yet. The semantic web technology Resource Description Framework (RDF) and related methods show to be sufficiently versatile to change that situation.
View Article and Find Full Text PDFGroups of neurons form ordered topographic maps on their targets, and defining the mechanisms that develop such maps, and re-connect them after disruption, has biological as well as clinical importance. The neuromuscular system is an accessible and well-studied model for defining the principles that guide map formation, both during its development and its reformation after motor nerve damage. We present evidence for the expression of this map at the level of nerve terminal morphology and muscle fiber type in the serratus anterior muscle.
View Article and Find Full Text PDFBrain Res Dev Brain Res
November 2004
Motor neurons project onto specific muscles with a distinct positional bias. We have previously shown using electrophysiological techniques that overexpression of ephrin-A5 degrades this topographic map. Here, we show that positional differences in axon terminal areas, an entirely different parameter of neuromuscular topography, are also eliminated with ephrin-A5 overexpression.
View Article and Find Full Text PDFMotor neuron pools innervate muscle fibers forming an ordered topographic map. In the gluteus maximus (GM) muscle, as well as additional muscles, we and others have demonstrated electrophysiologically that there exists a rostrocaudal distribution of axon terminals on the muscle surface. The role of muscle fiber type in determining this topography is unknown.
View Article and Find Full Text PDF