Background: Gene selection is an important part of microarray data analysis because it provides information that can lead to a better mechanistic understanding of an investigated phenomenon. At the same time, gene selection is very difficult because of the noisy nature of microarray data. As a consequence, gene selection is often performed with machine learning methods. The Random Forest method is particularly well suited for this purpose. In this work, four state-of-the-art Random Forest-based feature selection methods were compared in a gene selection context. The analysis focused on the stability of selection because, although it is necessary for determining the significance of results, it is often ignored in similar studies.
Results: The comparison of post-selection accuracy of a validation of Random Forest classifiers revealed that all investigated methods were equivalent in this context. However, the methods substantially differed with respect to the number of selected genes and the stability of selection. Of the analysed methods, the Boruta algorithm predicted the most genes as potentially important.
Conclusions: The post-selection classifier error rate, which is a frequently used measure, was found to be a potentially deceptive measure of gene selection quality. When the number of consistently selected genes was considered, the Boruta algorithm was clearly the best. Although it was also the most computationally intensive method, the Boruta algorithm's computational demands could be reduced to levels comparable to those of other algorithms by replacing the Random Forest importance with a comparable measure from Random Ferns (a similar but simplified classifier). Despite their design assumptions, the minimal optimal selection methods, were found to select a high fraction of false positives.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3897925 | PMC |
http://dx.doi.org/10.1186/1471-2105-15-8 | DOI Listing |
Sci Prog
January 2025
Department of Environmental and Industrial Biotechnology, Institute of Biotechnology, University of Gondar, Gondar, Ethiopia.
Objective: Heavy metal pollution is one of the more recent problems of environmental degradation caused by rapid industrialization and human activity. The objective of this study was to isolate, screen, and characterize heavy metal-resistant bacteria from solid waste disposal sites.
Methods: In this study, a total of 18 soil samples were randomly selected from mechanical sites, metal workshops, and agricultural land that received wastewater irrigation.
Front Plant Sci
January 2025
Department of Biological Sciences, National Sun Yat-sen University, Kaohsiung, Taiwan.
Insular species are usually endemic and prone to long-term population reduction, low genetic diversity, and inbreeding depression, which results in difficulties in species conservation. The situation is even more challenging for the glacial relict species whose habitats are usually fragmented in the mountainous regions. is an endangered and endemic relict tree species in Taiwan.
View Article and Find Full Text PDFFront Plant Sci
January 2025
College of Agriculture and Biology, Liaocheng University, Liaocheng, China.
The wall-associated kinase (WAK) gene family encodes functional cell wall-related proteins. These genes are widely presented in plants and serve as the receptors of plant cell membranes, which perceive the external environment changes and activate signaling pathways to participate in plant growth, development, defense, and stress response. However, the WAK gene family and the encoded proteins in soybean (Glycine max (L.
View Article and Find Full Text PDFJACS Au
January 2025
Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York 10029, United States.
Abraham Patchornik was born in 1926 in Ness Ziona, a town in Palestine founded by his great-grandfather Reuben Lehrer in 1883. He started to study chemistry as an undergraduate at the Hebrew University. However, this was interrupted by the war, and he completed his studies in various locations in West Jerusalem.
View Article and Find Full Text PDFJACS Au
January 2025
Department of Chemistry, University of Warwick, Coventry CV4 7AL, U.K.
Polyketide synthases (PKSs) are multidomain enzymatic assembly lines that biosynthesize a wide selection of bioactive natural products from simple building blocks. In contrast to their -acyltransferase (AT) counterparts, -AT PKSs rely on stand-alone ATs to load extender units onto acyl carrier protein (ACP) domains embedded in the core PKS machinery. -AT PKS gene clusters also encode stand-alone acyl hydrolases (AHs), which are predicted to share the overall fold of ATs but function like type II thioesterases (TEs), hydrolyzing aberrant acyl chains from ACP domains to promote biosynthetic efficiency.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!