Severity: Warning
Message: file_get_contents(https://...@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests
Filename: helpers/my_audit_helper.php
Line Number: 176
Backtrace:
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 1034
Function: getPubMedXML
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3152
Function: GetPubMedArticleOutput_2016
File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global
File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword
File: /var/www/html/index.php
Line: 316
Function: require_once
Background: The amount of data in health care is rapidly rising, leading to multiple datasets generated for any given individual. Data integration involves mapping variables in different datasets together to form a combined dataset which can then be used to conduct different types of analyses. However, with increasing numbers of variables, manual mapping of a dataset can become inefficient. Another approach is to use text classification through machine learning to classify the variables to a schema.
Objectives: Our aim was to create and evaluate the use of machine learning methods for the integration of data from datasets across health information-seeking behavior (HISB) databases.
Methods: Four online databases relevant to the research field were selected for integration. Two experiments were designed for dataset mapping: intra-database mapping using the one data source, and inter-database mapping to map datasets between the four databases. We compared logistic regression (LR), a random forest classifier (RFC), and neural network (NN) models by F1-score for two methods of integration. A third experiment was an ablation study that used all the available data to create a model for classifying HISB variables in a dataset.
Results: In intra-database mapping, the mean F1 score for an LR classifier (0.787) was better than the RFC score (0.767) and fully connected NN (0.735). In inter-database mapping, the LR (0.245) scored best, however, this was dependent on which database was used as a training source. Using all the databases, these top three models were able to correctly classify 90-91% of the variables. Removing one dataset improved scores and resulted in a model able to correctly classify 95-96% of the HISB variables.
Conclusions: As part of data integration, a neural network can be used as an approach to map the variables of a dataset. The developed models can be used to classify the HISB terms in a database.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.sapharm.2022.08.001 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!