In the prose style transfer task a system, provided with text input and a target prose style, produces output which preserves the meaning of the input text but alters the style. These systems require parallel data for evaluation of results and usually make use of parallel data for training. Currently, there are few publicly available corpora for this task. In this work, we identify a high-quality source of aligned, stylistically distinct text in different versions of the Bible. We provide a standardized split, into training, development and testing data, of the public domain versions in our corpus. This corpus is highly parallel since many Bible versions are included. Sentences are aligned due to the presence of chapter and verse numbers within all versions of the text. In addition to the corpus, we present the results, as measured by the BLEU and PINC metrics, of several models trained on our data which can serve as baselines for future research. While we present these data as a style transfer corpus, we believe that it is of unmatched quality and may be useful for other natural language tasks as well.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6227951 | PMC |
http://dx.doi.org/10.1098/rsos.171920 | DOI Listing |
J Am Med Inform Assoc
September 2024
Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, United States.
Objective: Investigate the use of advanced natural language processing models to streamline the time-consuming process of writing and revising scholarly manuscripts.
Materials And Methods: For this purpose, we integrate large language models into the Manubot publishing ecosystem to suggest revisions for scholarly texts. Our AI-based revision workflow employs a prompt generator that incorporates manuscript metadata into templates, generating section-specific instructions for the language model.
Heliyon
February 2024
College of Literature, Chongqing Normal University, Chongqing, 401331, China.
With the development of science, speech, picture, and other analysis, problems have been gradually better solved, but the study of Chinese text has been a complex problem to overcome. Chinese text analysis requires not only statistics but also semantic comprehension analysis. Different text types need other language style feature modeling to obtain good recognition results.
View Article and Find Full Text PDFNat Methods
March 2024
Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA.
Intrinsically disordered regions (IDRs) are ubiquitous across all domains of life and play a range of functional roles. While folded domains are generally well described by a stable three-dimensional structure, IDRs exist in a collection of interconverting states known as an ensemble. This structural heterogeneity means that IDRs are largely absent from the Protein Data Bank, contributing to a lack of computational approaches to predict ensemble conformational properties from sequence.
View Article and Find Full Text PDFPurpose: This mixed methods study developed multiple question types to understand and measure women's perceived benefit from adjuvant endocrine therapy. We hypothesis that patients do not understand this benefit and sought to develop the questions needed to test this hypothesis and obtain initial patient estimates.
Methods: From 8/2022 to 3/2023, qualitative interviews focused on assessing and modifying 9 initial varied question types asking about the overall survival (OS) benefit from adjuvant endocrine therapy.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!