The global cost-benefit analysis of pesticide use during the last 30 years has been characterized by a significant increase during the period from 1990 to 2007 followed by a decline. This observation can be attributed to several factors including, but not limited to, pest resistance, lack of novelty with respect to modes of action or classes of chemistry, and regulatory action. Due to current and projected increases of the global population, it is evident that the demand for food, and consequently, the usage of pesticides to improve yields will increase.
View Article and Find Full Text PDF"A Mathematical Theory of Communication" was published in 1948 by Claude Shannon to address the problems in the field of data compression and communication over (noisy) communication channels. Since then, the concepts and ideas developed in Shannon's work have formed the basis of information theory, a cornerstone of statistical learning and inference, and has been playing a key role in disciplines such as physics and thermodynamics, probability and statistics, computational sciences and biological sciences. In this article we review the basic information theory based concepts and describe their key applications in multiple major areas of research in computational biology-gene expression and transcriptomics, alignment-free sequence comparison, sequencing and error correction, genome-wide disease-gene association mapping, metabolic networks and metabolomics, and protein sequence, structure and interaction analysis.
View Article and Find Full Text PDFBMC Bioinformatics
August 2018
Background: Maize is a leading crop in the modern agricultural industry that accounts for more than 40% grain production worldwide. THe double haploid technique that uses fewer breeding generations for generating a maize line has accelerated the pace of development of superior commercial seed varieties and has been transforming the agricultural industry. In this technique the chromosomes of the haploid seeds are doubled and taken forward in the process while the diploids marked for elimination.
View Article and Find Full Text PDFTo characterize type 2 diabetes (T2D)-associated variation across the allele frequency spectrum, we conducted a meta-analysis of genome-wide association data from 26,676 T2D case and 132,532 control subjects of European ancestry after imputation using the 1000 Genomes multiethnic reference panel. Promising association signals were followed up in additional data sets (of 14,545 or 7,397 T2D case and 38,994 or 71,604 control subjects). We identified 13 novel T2D-associated loci ( < 5 × 10), including variants near the , , and genes.
View Article and Find Full Text PDFUnlabelled: Gene-based tests of association can increase the power of a genome-wide association study by aggregating multiple independent effects across a gene or locus into a single stronger signal. Recent gene-based tests have distinct approaches to selecting which variants to aggregate within a locus, modeling the effects of linkage disequilibrium, representing fractional allele counts from imputation, and managing permutation tests for p-values. Implementing these tests in a single, efficient framework has great practical value.
View Article and Find Full Text PDFThe rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data compression techniques is becoming an integral part of data production and analysis.
View Article and Find Full Text PDFGenome-wide association studies (GWAS) are now used routinely to identify SNPs associated with complex human phenotypes. In several cases, multiple variants within a gene contribute independently to disease risk. Here we introduce a novel Gene-Wide Significance (GWiS) test that uses greedy Bayesian model selection to identify the independent effects within a gene, which are combined to generate a stronger statistical signal.
View Article and Find Full Text PDFInformation-theoretic metrics have been proposed for studying gene-gene and gene-environment interactions in genetic epidemiology. Although these metrics have proven very promising, they are typically interpreted in the context of communications and information transmission, diminishing their tangibility for epidemiologists and statisticians. In this paper, we clarify the interpretation of information-theoretic metrics.
View Article and Find Full Text PDFCompound effects on cloned human Ether-à-go-go related gene (hERG) potassium channels have been used to assess the potential cardiac safety liabilities of drug development candidate compounds. In addition to radioactive ligand displacement tests, two other common approaches are surrogate ion-based flux assays and electrophysiological recordings. The former has much higher throughput, whereas the latter measures directly the effects on ionic currents.
View Article and Find Full Text PDFBackground: Multifactorial diseases such as cancer and cardiovascular diseases are caused by the complex interplay between genes and environment. The detection of these interactions remains challenging due to computational limitations. Information theoretic approaches use computationally efficient directed search strategies and thus provide a feasible solution to this problem.
View Article and Find Full Text PDFGene x gene interactions play important roles in the etiology of complex multi-factorial diseases like rheumatoid arthritis (RA). In this paper, we describe our use of a two-stage search strategy consisting of information theoretic methods and logistic regression to detect gene x gene interactions associated with RA using the data in Problem 1 of Genetic Analysis Workshop 16. Our method detected interactions of several SNPs (single-SNP and SNP x SNP) that are located on chromosomal regions linked to RA and related diseases in previous studies.
View Article and Find Full Text PDFInterest is increasing in epistasis as a possible source of the unexplained variance missed by genome-wide association studies. The Genetic Analysis Workshop 16 Group 9 participants evaluated a wide variety of classical and novel analytical methods for detecting epistasis, in both the statistical and machine learning paradigms, applied to both real and simulated data. Because the magnitude of epistasis is clearly relative to scale of penetrance, and therefore to some extent, to the choice of model framework, it is not surprising that strong interactions under one model might be minimized or even disappear entirely under a different modeling framework.
View Article and Find Full Text PDFBackground: The purpose of this research was to develop a novel information theoretic method and an efficient algorithm for analyzing the gene-gene (GGI) and gene-environmental interactions (GEI) associated with quantitative traits (QT). The method is built on two information-theoretic metrics, the k-way interaction information (KWII) and phenotype-associated information (PAI). The PAI is a novel information theoretic metric that is obtained from the total information correlation (TCI) information theoretic metric by removing the contributions for inter-variable dependencies (resulting from factors such as linkage disequilibrium and common sources of environmental pollutants).
View Article and Find Full Text PDFWe developed an information-theoretic metric called the Interaction Index for prioritizing genetic variations and environmental variables for follow-up in detailed sequencing studies. The Interaction Index was found to be effective for prioritizing the genetic and environmental variables involved in GEI for a diverse range of simulated data sets. The metric was also evaluated for a 103-SNP Crohn's disease dataset and a simulated data set containing 9187 SNPs and multiple covariates that was modeled on a rheumatoid arthritis data set.
View Article and Find Full Text PDFWe developed a computationally efficient algorithm AMBIENCE, for identifying the informative variables involved in gene-gene (GGI) and gene-environment interactions (GEI) that are associated with disease phenotypes. The AMBIENCE algorithm uses a novel information theoretic metric called phenotype-associated information (PAI) to search for combinations of genetic variants and environmental variables associated with the disease phenotype. The PAI-based AMBIENCE algorithm effectively and efficiently detected GEI in simulated data sets of varying size and complexity, including the 10K simulated rheumatoid arthritis data set from Genetic Analysis Workshop 15.
View Article and Find Full Text PDFThe purpose of our work was to develop heuristics for visualizing and interpreting gene-environment interactions (GEIs) and to assess the dependence of candidate visualization metrics on biological and study-design factors. Two information-theoretic metrics, the k-way interaction information (KWII) and the total correlation information (TCI), were investigated. The effectiveness of the KWII and TCI to detect GEIs in a diverse range of simulated data sets and a Crohn disease data set was assessed.
View Article and Find Full Text PDF