Stakeholders of software development projects have various information needs for making rational decisions during their daily work. Satisfying these needs requires substantial knowledge of where and how the relevant information is stored and consumes valuable time that is often not available. Easing the need for this knowledge is an ideal text-to-SQL benchmark problem, a field where public datasets are scarce and needed. We propose the SEOSS-Queries dataset consisting of natural language utterances and accompanying SQL queries extracted from previous studies, software projects, issue tracking tools, and through expert surveys to cover a large variety of information need perspectives. Our dataset consists of 1,162 English utterances translating into 166 SQL queries; each query has four precise utterances and three more general ones. Furthermore, the dataset contains 393,086 labeled utterances extracted from issue tracker comments. We provide pre-trained SQLNet and RatSQL baseline models for benchmark comparisons, a replication package facilitating a seamless application, and discuss various other tasks that may be solved and evaluated using the dataset. The whole dataset with paraphrased natural language utterances and SQL queries is hosted at figshare.com/s/75ed49ef01ac2f83b3e2.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9079685 | PMC |
http://dx.doi.org/10.1016/j.dib.2022.108211 | DOI Listing |
Clin Trials
December 2024
Cancer Research UK Southampton Clinical Trials Unit, MP131, Southampton General Hospital, University of Southampton, Southampton, UK.
Unlocking the power of personalised medicine in oncology hinges on the integration of clinical trial data with translational data (i.e. biospecimen-derived molecular information).
View Article and Find Full Text PDFBMC Prim Care
September 2024
Department of Population Health, NYU School of Medicine, New York City, NY, United States of America.
Background: This study describes how New York City (NYC) Health + Hospitals implemented a large-scale Community Health Worker (CHW) program in adult primary care clinics between January 2022 and December 2023 and established metrics to monitor program implementation. This study is timely as healthcare systems consider how to scale high-quality CHW programs.
Methods: We collected metrics in the following areas: (1) Workforce demographics, team structure, and training; (2) Enrolled patient demographics; (3) Patient-centered metrics, such as patient counts (e.
J Chem Inf Model
August 2024
Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.
The Block Copolymer Database (BCDB) is a platform that allows users to search, submit, visualize, benchmark, and download experimental phase measurements and their associated characterization information for di- and multiblock copolymers. To the best of our knowledge, there is no widely accepted data model for publishing experimental and simulation data on block copolymer self-assembly. This proposed data schema with traceable information can accommodate any number of blocks and at the time of publication contains over 5400 block copolymer total melt phase measurements mined from the literature and manually curated and simulation data points of the phase diagram generated from self-consistent field theory that can rapidly be augmented.
View Article and Find Full Text PDFJ Am Med Inform Assoc
October 2024
Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15260, United States.
Objectives: To demonstrate that 2 popular cohort discovery tools, Leaf and the Shared Health Research Information Network (SHRINE), are readily interoperable. Specifically, we adapted Leaf to interoperate and function as a node in a federated data network that uses SHRINE and dynamically generate queries for heterogeneous data models.
Materials And Methods: SHRINE queries are designed to run on the Informatics for Integrating Biology & the Bedside (i2b2) data model.
Front Big Data
June 2024
Department of Computer Science, Brunel University London, London, United Kingdom.
Introduction: In response to the increasing prevalence of electronic medical records (EMRs) stored in databases, healthcare staff are encountering difficulties retrieving these records due to their limited technical expertise in database operations. As these records are crucial for delivering appropriate medical care, there is a need for an accessible method for healthcare staff to access EMRs.
Methods: To address this, natural language processing (NLP) for Text-to-SQL has emerged as a solution, enabling non-technical users to generate SQL queries using natural language text.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!