A predominant source of complication in SARS-CoV-2 patients arises from a severe systemic inflammation that can lead to tissue damage and organ failure. The high inflammatory burden of this viral infection often results in cardiovascular comorbidities. A better understanding of the interaction between immune pathways and cardiovascular proteins might inform medical decisions and therapeutic approaches.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
June 2021
Identifying statistical dependence between the features and the label is a fundamental problem in supervised learning. This paper presents a framework for estimating dependence between numerical features and a categorical label using generalized Gini distance, an energy distance in reproducing kernel Hilbert spaces (RKHS). Two Gini distance based dependence measures are explored: Gini distance covariance and Gini distance correlation.
View Article and Find Full Text PDFBreast cancer is intrinsically heterogeneous and is commonly classified into four main subtypes associated with distinct biological features and clinical outcomes. However, currently available data resources and methods are limited in identifying molecular subtyping on protein-coding genes, and little is known about the roles of long non-coding RNAs (lncRNAs), which occupies 98% of the whole genome. lncRNAs may also play important roles in subgrouping cancer patients and are associated with clinical phenotypes.
View Article and Find Full Text PDFBackground: Chemical bioavailability is an important dose metric in environmental risk assessment. Although many approaches have been used to evaluate bioavailability, not a single approach is free from limitations. Previously, we developed a new genomics-based approach that integrated microarray technology and regression modeling for predicting bioavailability (tissue residue) of explosives compounds in exposed earthworms.
View Article and Find Full Text PDFCancer is a disease characterized largely by the accumulation of out-of-control somatic mutations during the lifetime of a patient. Distinguishing driver mutations from passenger mutations has posed a challenge in modern cancer research. With the advanced development of microarray experiments and clinical studies, a large numbers of candidate cancer genes have been extracted and distinguishing informative genes out of them is essential.
View Article and Find Full Text PDFBackground: Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance.
View Article and Find Full Text PDFGlycogen synthase kinase-3 (GSK-3) is a multifunctional serine/threonine protein kinase which regulates a wide range of cellular processes, involving various signalling pathways. GSK-3β has emerged as an important therapeutic target for diabetes and Alzheimer's disease. To identify structurally novel GSK-3β inhibitors, we performed virtual screening by implementing a combined ligand-based/structure-based approach, which included quantitative structure-activity relationship (QSAR) analysis and docking prediction.
View Article and Find Full Text PDFBackground: In drug discovery and development, it is crucial to determine which conformers (instances) of a given molecule are responsible for its observed biological activity and at the same time to recognize the most representative subset of features (molecular descriptors). Due to experimental difficulty in obtaining the bioactive conformers, computational approaches such as machine learning techniques are much needed. Multiple Instance Learning (MIL) is a machine learning method capable of tackling this type of problem.
View Article and Find Full Text PDFBackground: In the context of drug discovery and development, much effort has been exerted to determine which conformers of a given molecule are responsible for the observed biological activity. In this work we aimed to predict bioactive conformers using a variant of supervised learning, named multiple-instance learning. A single molecule, treated as a bag of conformers, is biologically active if and only if at least one of its conformers, treated as an instance, is responsible for the observed bioactivity; and a molecule is inactive if none of its conformers is responsible for the observed bioactivity.
View Article and Find Full Text PDFIEEE Trans Nanobioscience
September 2012
There are a vast number of biology related research problems involving a combination of multiple sources of data to achieve a better understanding of the underlying problems. It is important to select and interpret the most important information from these sources. Thus it will be beneficial to have a good algorithm to simultaneously extract rules and select features for better interpretation of the predictive model.
View Article and Find Full Text PDFInt J Bioinform Res Appl
September 2012
Incorporating various sources of biological information is important for biological discovery. For example, genes have a multiview representation. They can be represented by features such as sequence length and pairwise similarities.
View Article and Find Full Text PDFBackground: It is commonly believed that including domain knowledge in a prediction model is desirable. However, representing and incorporating domain information in the learning process is, in general, a challenging problem. In this research, we consider domain information encoded by discrete or categorical attributes.
View Article and Find Full Text PDFBackground: Microarray technology has made it possible to simultaneously monitor the expression levels of thousands of genes in a single experiment. However, the large number of genes greatly increases the challenges of analyzing, comprehending and interpreting the resulting mass of data. Selecting a subset of important genes is inevitable to address the challenge.
View Article and Find Full Text PDFBMC Bioinformatics
November 2007
Background: Mean-based clustering algorithms such as bisecting k-means generally lack robustness. Although componentwise median is a more robust alternative, it can be a poor center representative for high dimensional data. We need a new algorithm that is robust and works well in high dimensional data sets e.
View Article and Find Full Text PDFBMC Bioinformatics
September 2006
Background: Recursive Feature Elimination is a common and well-studied method for reducing the number of attributes used for further analysis or development of prediction models. The effectiveness of the RFE algorithm is generally considered excellent, but the primary obstacle in using it is the amount of computational power required.
Results: Here we introduce a variant of RFE which employs ideas from simulated annealing.
The MidSouth Computational Biology and Bioinformatics Society (MCBIOS) describes its efforts to provide local opportunities for researchers to learn and connect with colleagues
View Article and Find Full Text PDFThis paper contains a description of several common normalization methods used in microarray analysis, and compares the effect of these methods on microarray data. The importance of background subtraction is also addressed. The research focuses on three parts.
View Article and Find Full Text PDFTreatment of pediatric acute lymphoblastic leukemia (ALL) is based on the concept of tailoring the intensity of therapy to a patient's risk of relapse. To determine whether gene expression profiling could enhance risk assignment, we used oligonucleotide microarrays to analyze the pattern of genes expressed in leukemic blasts from 360 pediatric ALL patients. Distinct expression profiles identified each of the prognostically important leukemia subtypes, including T-ALL, E2A-PBX1, BCR-ABL, TEL-AML1, MLL rearrangement, and hyperdiploid >50 chromosomes.
View Article and Find Full Text PDF