Background And Objective: Healthcare tweets are particularly challenging due to its sparse layout and its limited character size. Compared to previous method based on "bag of words" (BOW) model, this study uniquely identifies the enrichment protocol and learns how semantically different aspects of feature selection such as BOW (feature F0), term frequency inverse document frequency (TF-IDF, feature F1), and latent semantic indexing (LSI, feature F2) when applied sequentially with classifier improves the overall performance.
Methods: To study this enrichment concept, our ML model is tested on two kinds of diverse data sets: (i) D1: Disease data with conjunctivitis, diarrhea, stomach ache, cough and nausea related tweets, and (ii) D2: WebKB4 dataset, while adapting three kind of classifiers (a) C1: support vector machine with radial basis function (SVMR), (b) C2: Multi-layer perceptron (MLP) and (c) C3: Random Forest (RF). Partition protocol (K10) was adapted with different performance metrics to evaluate machine learning (ML)-system.
Results: Using the combination of F1, C1, D1, K10, ML accuracy was: 94%, while with F2, C1, D1, K10, ML accuracy was 97%. Using the incremental feature enrichment from F0 to F2, K10 protocol gave F1 improvement over F0 by 4.98% on Disease dataset, while F2 improvement over F0 was by 11.78% on WebKB4 dataset. We demonstrated the generalization over memorization process in our ML-design. The system was tested for stability and reliability.
Conclusions: We conclude that semantically different aspects of feature selection, when adapted sequentially, leads to improvement in ML-accuracy for healthcare data sets. We validated the system by taking non-healthcare data sets.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.cmpb.2019.01.011 | DOI Listing |
Annu Rev Chem Biomol Eng
January 2025
1Department of Chemical & Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina, USA; email:
Understanding the molecular, cellular, and physiological components of neurodegenerative diseases (NDs) is paramount for developing accurate diagnostics and efficacious therapies. However, the complexity of ND pathology and the limitations associated with conventional analytical methods undermine research. Fortunately, microfluidic technology can facilitate discoveries through improved biomarker quantification, brain organoid culture, and small animal model manipulation.
View Article and Find Full Text PDFPLoS One
January 2025
Waste Data and Analysis Center, Department of Technology & Society, Stony Brook University, Stony Brook, New York, United States of America.
The composition of solid waste affects technology choices and policy decisions regarding its management. Analyses of waste composition studies are almost always made on a parameter by parameter basis. Multivariate distance techniques can create wholisitic determinations of similarities and differences and were applied here to enhance a series of waste composition comparisons.
View Article and Find Full Text PDFPlant Physiol
January 2025
Rothamsted Research, West Common, Harpenden, Al5 2JQ, UK.
The emerging crop Camelina sativa (L.) Crantz (camelina) is a Brassicaceae oilseed with a rapidly growing reputation for the deployment of advanced lipid biotechnology and metabolic engineering. Camelina is recognised by agronomists for its traits including yield, oil/protein content, drought tolerance, limited input requirements, plasticity and resilience.
View Article and Find Full Text PDFGenet Epidemiol
January 2025
Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, Division of Musculoskeletal and Dermatological Sciences, The University of Manchester, Manchester, UK.
Transcriptome-wide association studies (TWAS) investigate the links between genetically regulated gene expression and complex traits. TWAS involves imputing gene expression using expression quantitative trait loci (eQTL) as predictors and testing the association between the imputed expression and the trait. The effectiveness of TWAS depends on the accuracy of these imputation models, which require genotype and gene expression data from the same samples.
View Article and Find Full Text PDFAge Ageing
January 2025
Centre for Research in Public Health and Community Care (CRIPACC), University of Hertfordshire, College Lane, Hatfield, UK.
Background: We developed a prototype minimum data set (MDS) for English care homes, assessing feasibility of extracting data directly from digital care records (DCRs) with linkage to health and social care data.
Methods: Through stakeholder development workshops, literature reviews, surveys and public consultation, we developed an aspirational MDS. We identified ways to extract this from existing sources, including DCRs and routine health and social care datasets.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!