A closed domain question answering (QA) dataset in statistical metadata is important to build an effective QA system about statistic. This dataset can be utilized to train or fine-tune the QA models in statistic. Further, it can also be exploited to evaluate the effectiveness of any QA methods in statistical domain. In this research, we build a new dataset of statistical metadata documents and question-answer pairs annotations of these documents in Indonesian language, called StatMetaQA (Statistical Metadata Question Answering). The collection of statistical metadata documents is used as the knowledge base of a QA system, while the collection of question-answer pairs annotations is used to train or fine-tune the QA models in statistic. The collection of statistical metadata documents, consisting of 861 statistical activity metadata documents and 1,231 statistical indicator metadata documents, was obtained from a website managed by the Statistics Indonesia (http://sirusa.bps.go.id). Next, the collection of question-answer pairs about statistical metadata, consisting of 28,863 question-answer pairs from 1,000 statistical metadata documents, was obtained using two strategies: human and automatic annotation. Here, 7353 question-answer pairs were manually annotated by human, and 21,510 question-answer pairs were automatically generated by machine using our predefined templates that were applied on some document fields of statistical metadata.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11416616 | PMC |
http://dx.doi.org/10.1016/j.dib.2024.110816 | DOI Listing |
J Med Internet Res
March 2025
Division of Biomedical and Public Health Ethics, Department of General Health Studies, Karl Landsteiner University of Health Sciences, Krems, Austria.
Background: Smartphone mobile health (mHealth) apps have the potential to enhance access to health care services and address health care disparities, especially in low-resource settings. However, when developed without attention to equity and inclusivity, mHealth apps can also exacerbate health disparities. Understanding and creating solutions for the disparities caused by mHealth apps is crucial for achieving health equity.
View Article and Find Full Text PDFPeerJ Comput Sci
February 2025
Heinz Nixdorf Chair of Distributed Information Systems, Friedrich-Schiller Universität Jena, Jena, Thuringia, Germany.
Artificial intelligence (AI) is revolutionizing biodiversity research by enabling advanced data analysis, species identification, and habitats monitoring, thereby enhancing conservation efforts. Ensuring reproducibility in AI-driven biodiversity research is crucial for fostering transparency, verifying results, and promoting the credibility of ecological findings. This study investigates the reproducibility of deep learning (DL) methods within the biodiversity research.
View Article and Find Full Text PDFSci Rep
March 2025
Grupo Informática de Biossistemas, Bioengenharia e Genômica, Instituto René Rachou, Fiocruz Minas, Av. Augusto de Lima, 1715, Barro Preto, Belo Horizonte, MG, Brazil.
The integration of sequenced samples and clinical data from independent yet related studies from public domain databases, such as The Sequence Read Archive (SRA), has the potential to increase sample sizes and enhance the statistical power needed for more precise bioinformatic analysis. Data mining and sample grouping are the starting points in this process and still present several challenges, including the presence of structured and unstructured data, missing deposited data, and varying experimental conditions and techniques applied across the studies. Designed to address the main challenges of data mining and sample grouping for biomarkers research, the proposed methodology employs a computational approach integrating relational database construction, text and data mining, natural language processing, network analysis, search by Pubmed publications, and combining MeSH, TTD and WordNet database to identify groups of samples with the same characteristics.
View Article and Find Full Text PDFBackground: TikTok's MedTok is an interconnected network of patients, providers, and producers sharing knowledge and experiences of health-related topics. Awareness of popular content on weight loss medications can benefit healthcare professionals, especially regarding side effects and management.
Objectives: Describe content in popular TikTok videos using side effect hashtags for gastric inhibitory peptide (GIP) and glucagon-like peptide-1 (GLP-1) receptor agonists.
Front Nutr
February 2025
Department of Environmental Sciences, Jožef Stefan Institute, Ljubljana, Slovenia.
The IsoFoodTrack database is a comprehensive, scalable, and flexible platform designed to manage isotopic and elemental composition data for a wide range of food commodities. It supports research in food authenticity and fraud detection by integrating isotopic data with rich metadata, including geographical, production, and methodological details. The database is built for scalability, allowing the addition of new commodities, analytical methods, and metadata fields, while ensuring interoperability with external databases through standardized formats and API integration.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!