This data article presents a dataset for Siswati, a Bantu language of the Nguni group that is one of the eleven official South African languages and the official language of Eswatini (together with English). The dataset contains parallel textual data between English and Siswati as well as monolingual data for Siswati and was developed for use as training data for machine translation systems, specifically the Autshumato machine translation project. Both corpora can also be used for development and evaluation of Natural Language Processing (NLP) core technologies for Siswati. In addition, the data lends itself for corpus linguistic studies. The article describes how the data was collected, what type of texts it contains and what clean-up was done. It also provides an overview of the number of words contained in the datasets.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11010775 | PMC |
http://dx.doi.org/10.1016/j.dib.2024.110325 | DOI Listing |
Br J Psychol
January 2025
Department of Psychological and Cognitive Sciences, Tsinghua University, Beijing, China.
How to raise donations effectively, especially in the E-era, has puzzled fundraisers and scientists across various disciplines. Our research focuses on donation-based crowdfunding projects and investigates how the emotional valence expressed verbally (in textual descriptions) and visually (in facial images) in project descriptions affects project performance. Study 1 uses field data (N = 3817), grabs project information and descriptions from a top donation-based crowdfunding platform, computes visual and verbal emotional valence using a deep-learning-based affective computing method and analyses how multimodal emotional valence influences donation outcomes.
View Article and Find Full Text PDFNPJ Digit Med
January 2025
Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China.
Chatbot-based multimodal AI holds promise for collecting medical histories and diagnosing ophthalmic diseases using textual and imaging data. This study developed and evaluated the ChatGPT-powered Intelligent Ophthalmic Multimodal Interactive Diagnostic System (IOMIDS) to enable patient self-diagnosis and self-triage. IOMIDS included a text model and three multimodal models (text + slit-lamp, text + smartphone, text + slit-lamp + smartphone).
View Article and Find Full Text PDFJMIR Cancer
January 2025
Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.
Background: The application of natural language processing in medicine has increased significantly, including tasks such as information extraction and classification. Natural language processing plays a crucial role in structuring free-form radiology reports, facilitating the interpretation of textual content, and enhancing data utility through clustering techniques. Clustering allows for the identification of similar lesions and disease patterns across a broad dataset, making it useful for aggregating information and discovering new insights in medical imaging.
View Article and Find Full Text PDFSensors (Basel)
January 2025
Department of Architectural Engineering, Dankook University, 152 Jukjeon-ro, Yongin-si 16890, Republic of Korea.
In the construction industry, ensuring the proper installation, retention, and dismantling of temporary structures, such as jack supports, is critical to maintaining safety and project timelines. However, inconsistencies between on-site data and construction documentation remain a significant challenge. To address this, this study proposes an integrated monitoring framework that combines computer vision-based object detection and document recognition techniques.
View Article and Find Full Text PDFSensors (Basel)
January 2025
Department of Informatics-Science and Engineering (DISI), University of Bologna, 40126 Bologna, Italy.
Person re-identification (re-id) is a critical computer vision task aimed at identifying individuals across multiple non-overlapping cameras, with wide-ranging applications in intelligent surveillance systems. Despite recent advances, the domain gap-performance degradation when models encounter unseen datasets-remains a critical challenge. CLIP-based models, leveraging multimodal pre-training, offer potential for mitigating this issue by aligning visual and textual representations.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!