Non-standardised early vernaculars present a problem for search tools due to the high degree of variation. The challenge lies in the variation found in orthography, syntax, and lexicon between titles, incipits, and explicits in manuscript copies of the same work. Traditional search methods relying on exact string matching or regular expressions fail to address these variations comprehensively. This project presents a web-based search tool specifically designed to handle linguistic and textual variation. The software is made available as a part of the (IMEP). The search tool addresses the issue of variation by utilizing a database of incipits and explicits, character-based n-gram language models (LMs) built with the (SRILM) toolkit, and a fuzzy search script (IMEP: FSS) written in Python. The tool optimizes for recall, retrieving multiple potential matches for a search string, without attempting to identify the 'correct' one. The search process involves looking up exact matches in the database while simultaneously using the fuzzy search script to evaluate the incipits and explicits against a model of the search string, followed by a match of the search string against models of the incipits and explicits. This two-step process shortens the processing time, which would otherwise be unreasonably long, because while using SRILM to match the search string against each incipit or explicit in the IMEP for precision could be time-consuming, running a first step where all texts are matched against a single LM built from the search string allows for faster processing. A web application, built using Django and Docker, combines the results of the direct database lookup and the fuzzy search script, presenting them as a list with exact matches followed by fuzzy matches ordered by increasing model perplexity. The tool is made available Open Access and can be adapted to other datasets.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10808851 | PMC |
http://dx.doi.org/10.12688/openreseurope.16590.1 | DOI Listing |
Environ Evid
January 2025
Modelling, Evidence and Policy RG, SNES, Newcastle University, Newcastle, NE1 7RU, UK.
Background: Riparian zones are vital transitional habitats that bridge the gap between terrestrial and aquatic ecosystems. They support elevated levels of biodiversity and provide an array of important regulatory and provisioning ecosystem services, of which, many are fundamentally important to human well-being, such as the maintenance of water quality and the mitigation of flood risk along waterways. Increasing anthropogenic pressures resulting from agricultural intensification, industry development and the expansion of infrastructure in tropical regions have led to the widespread degradation of riparian habitats resulting in biodiversity loss and decreased resilience to flooding and erosion.
View Article and Find Full Text PDFArch Ital Urol Androl
January 2025
Department of Immunology, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok.
Background: Double J Stent is one of the procedures frequently performed in the field of urology. Forgotten DJ Stent is a problem that can cause serious complications. This systematic review aims to explore complications and management of patients with forgotten double J stents.
View Article and Find Full Text PDFPhys Rev Lett
December 2024
Univ Coimbra, Faculdade de Ciências e Tecnologia da Universidade de Coimbra and CFisUC, Rua Larga, 3004-516 Coimbra, Portugal.
The search for primordial black holes (PBHs) with masses M≪M_{⊙} is motivated by natural early-Universe production mechanisms and that PBHs can be dark matter. For M≲10^{14} kg, the PBH density is constrained by null searches for their expected Hawking emission (HE), the characteristics of which are, however, sensitive to new states beyond the standard model. If there exists a large number of spin-0 particles in nature, PBHs can, through HE, develop and maintain non-negligible spins, modifying the visible HE.
View Article and Find Full Text PDFFront Vet Sci
January 2025
School of Science, STEM College, RMIT University, Melbourne, VIC, Australia.
J Med Internet Res
January 2025
Centre for Research in Media and Communication, Faculty of Social Sciences and Humanities, Universiti Kebangsaan Malaysia, Selangor, Malaysia.
Background: Cardiovascular disease (CVD) is a major global health issue, with approximately 70% of cases linked to modifiable risk factors. Digital health solutions offer potential for CVD prevention; yet, their effectiveness in covering the full range of prevention strategies is uncertain.
Objective: This study aimed to synthesize current literature on digital solutions for CVD prevention, identify the key components of effective digital interventions, and highlight critical research gaps to inform the development of sustainable strategies for CVD prevention.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!