Given the wide variability in the quality of next-generation sequencing data submitted to public repositories, it is essential to identify methods that can perform quality control on these data sets when additional quality control data, such as mean tile data, are missing from public repositories. In this study, we present evidence that correlating counts of reads corresponding to pairs of motifs separated over specific distances on individual exons can be used as a proxy mean tile data in the data sets we analyzed and hence could be used when mean tile data are not available. As test data sets we use the in vitro transcribed (IVT) data set, and a data set comprising wild and mutant types.
View Article and Find Full Text PDFFront Res Metr Anal
July 2022
The FAIR data principles are rapidly becoming a standard through which to assess responsible and reproducible research. In contrast to the requirements associated with the Interoperability principle, the requirements associated with the Accessibility principle are often assumed to be relatively straightforward to implement. Indeed, a variety of different tools assessing FAIR rely on the data being deposited in a trustworthy digital repository.
View Article and Find Full Text PDFWe evaluate recent efforts to further the effective teaching of FAIR data principles by examining existing and developing educational frameworks focused upon FAIR, training initiatives that have informed teaching on FAIR skills' topics, and a number of key sources for discovering FAIR training materials and how much those sources provide descriptive information about the materials. FAIR4S, providing a coherent description of skills and competencies, is analyzed by target audience using the description of actors found in a European Open Science Cloud ecosystem report and by comparison of the coverage and extent of description of educational and training materials available from the list of sources for finding such materials. Our analysis elucidates the importance of linking resources to FAIR-related educational frameworks, providing consistent descriptions of them using a community-based metadata scheme, and developing an instructor community of practice where ideas and methods can be shared on how to teach FAIR data skills.
View Article and Find Full Text PDFThe systemic challenges of the COVID-19 pandemic require cross-disciplinary collaboration in a global and timely fashion. Such collaboration needs open research practices and the sharing of research outputs, such as data and code, thereby facilitating research and research reproducibility and timely collaboration beyond borders. The Research Data Alliance COVID-19 Working Group recently published a set of recommendations and guidelines on data sharing and related best practices for COVID-19 research.
View Article and Find Full Text PDFData science skills are rapidly becoming a necessity in modern science. In response to this need, institutions and organizations around the world are developing research data science curricula to teach the programming and computational skills that are needed to build and maintain data infrastructures and maximize the use of available data. To date, however, few of these courses have included an explicit ethics component, and developing such components can be challenging.
View Article and Find Full Text PDFThe paper reviews the use of the Hadoop platform in structural bioinformatics applications. For structural bioinformatics, Hadoop provides a new framework to analyse large fractions of the Protein Data Bank that is key for high-throughput studies of, for example, protein-ligand docking, clustering of protein-ligand complexes and structural alignment. Specifically we review in the literature a number of implementations using Hadoop of high-throughput analyses and their scalability.
View Article and Find Full Text PDFDetecting sources of bias in transcriptomic data is essential to determine signals of Biological significance. We outline a novel method to detect sequence specific bias in short read Next Generation Sequencing data. This is based on determining intra-exon correlations between specific motifs.
View Article and Find Full Text PDFBackground: The workflow for the production of high-throughput sequencing data from nucleic acid samples is complex. There are a series of protocol steps to be followed in the preparation of samples for next-generation sequencing. The quantification of bias in a number of protocol steps, namely DNA fractionation, blunting, phosphorylation, adapter ligation and library enrichment, remains to be determined.
View Article and Find Full Text PDFWe discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress.
View Article and Find Full Text PDFOur knowledge of the role of higher-order chromatin structures in transcription of microRNA genes (MIRs) is evolving rapidly. Here we investigate the effect of 3D architecture of chromatin on the transcriptional regulation of MIRs. We demonstrate that MIRs have transcriptional features that are similar to protein-coding genes.
View Article and Find Full Text PDFPhytohormones regulate plant growth from cell division to organ development. Jasmonates (JAs) are signaling molecules that have been implicated in stress-induced responses. However, they have also been shown to inhibit plant growth, but the mechanisms are not well understood.
View Article and Find Full Text PDFProbes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming. We demonstrate that for a specific microarray data set using the Human HG_U133A Affymetrix GeneChip and RMA normalization there is significant bias in the expression levels, the fold change and the correlations between expression levels.
View Article and Find Full Text PDFThe shade avoidance syndrome (SAS) allows plants to anticipate and avoid shading by neighbouring plants by initiating an elongation growth response. The phytochrome photoreceptors are able to detect a reduction in the red:far red ratio in incident light, the result of selective absorption of red and blue wavelengths by proximal vegetation. A shade-responsive luciferase reporter line (PHYB::LUC) was used to carry out a high-throughput screen to identify novel SAS mutants.
View Article and Find Full Text PDFIn darkness, shoot apex growth is repressed, but it becomes rapidly activated by light. We show that phytochromes and cryptochromes play largely redundant roles in this derepression in Arabidopsis thaliana. We examined the light activation of transcriptional changes in a finely resolved time course, comparing the shoot apex (meristem and leaf primordia) and the cotyledon and found >5700 differentially expressed genes.
View Article and Find Full Text PDFRobust methods to detect DNA-binding proteins from structures of unknown function are important for structural biology. This paper describes a method for identifying such proteins that (i) have a solvent accessible structural motif necessary for DNA-binding and (ii) a positive electrostatic potential in the region of the binding region. We focus on three structural motifs: helix-turn-helix (HTH), helix-hairpin-helix (HhH) and helix-loop-helix (HLH).
View Article and Find Full Text PDFNucleic Acids Res
December 2003
A method to detect DNA-binding sites on the surface of a protein structure is important for functional annotation. This work describes the analysis of residue patches on the surface of DNA-binding proteins and the development of a method of predicting DNA-binding sites using a single feature of these surface patches. Surface patches and the DNA-binding sites were initially analysed for accessibility, electrostatic potential, residue propensity, hydrophobicity and residue conservation.
View Article and Find Full Text PDF