Text Reuse reveals meaningful reiterations of text in large corpora. Humanities researchers use text reuse to study, e.g., the posterior reception of influential texts or to reveal evolving publication practices of historical media. This research is often supported by interactive visualizations which highlight relations and differences between text segments. In this paper, we build on earlier work in this domain. We present Text Reuse at Scale, the to our knowledge first interface which integrates text reuse data with other forms of semantic enrichment to enable a versatile and scalable exploration of intertextual relations in historical newspaper corpora. The Text Reuse at Scale interface was developed as part of the project and combines powerful search and filter operations with close and distant reading perspectives. We integrate text reuse data with enrichments derived from topic modeling, named entity recognition and classification, language and document type detection as well as a rich set of newspaper metadata. We report on historical research objectives and common user tasks for the analysis of historical text reuse data and present the prototype interface together with the results of a user evaluation.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654985 | PMC |
http://dx.doi.org/10.3389/fdata.2023.1249469 | DOI Listing |
ACS Appl Mater Interfaces
December 2024
Henan Key Laboratory of Polyoxometalate Chemistry, College of Chemistry and Molecular Sciences, Henan University, Kaifeng, Henan 475004, China.
Photochromic films have attracted increasing attention for their potential in information storage and photoswitchable devices. Developing novel photochromic materials is still a highly challenging topic. In this work, we successfully obtained an unprecedented mixed-heteroatom-templated Er-incorporated polyoxometalate [HN(CH)]NaKH{Er(HO)WO[SeTeWO][B-α-TeWO]}·117HO () by concurrently introducing the [TeO] and [SeO] heteroanion templates into the Er/WO system.
View Article and Find Full Text PDFBrief Bioinform
November 2024
Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States.
Reusing massive collections of publicly available biomedical data can significantly impact knowledge discovery. However, these public samples and studies are typically described using unstructured plain text, hindering the findability and further reuse of the data. To combat this problem, we propose txt2onto 2.
View Article and Find Full Text PDFDatabase (Oxford)
December 2024
Intelligent Computing Department, Institute of Medical Information & Library, Chinese Academy of Medical Sciences/Peking Union Medical College, No. 3 Yabao Road, Beijing 100020, China.
Atherosclerotic cerebrovascular disease could result in a great number of deaths and disabilities. However, it did not acquire enough attention. Less information, statistics, or data on the disease has been revealed.
View Article and Find Full Text PDFDatabase (Oxford)
November 2024
Center for Computational Biomedicine, Harvard Medical School, 10 Shattuck St, Boston, MA 02115, United States.
There is an ongoing need for scalable tools to aid researchers in both retrospective and prospective standardization of discrete entity types-such as disease names, cell types, or chemicals-that are used in metadata associated with biomedical data. When metadata are not well-structured or precise, the associated data are harder to find and are often burdensome to reuse, analyze, or integrate with other datasets due to the upfront curation effort required to make the data usable-typically through retrospective standardization and cleaning of the (meta)data. With the goal of facilitating the task of standardizing metadata-either in bulk or in a one-by-one fashion, e.
View Article and Find Full Text PDFSensors (Basel)
November 2024
School of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China.
To address the issue of efficiently reusing the massive amount of unstructured knowledge generated during the handling of track circuit equipment faults and to automate the construction of knowledge graphs in the railway maintenance domain, it is crucial to leverage knowledge extraction techniques to efficiently extract relational triplets from fault maintenance text data. Given the current lag in joint extraction technology within the railway domain and the inefficiency in resource utilization, this paper proposes a joint extraction model for track circuit entities and relations, integrating Global Pointer and tensor learning. Taking into account the associative characteristics of semantic relations, the nesting of domain-specific terms in the railway sector, and semantic diversity, this research views the relation extraction task as a tensor learning process and the entity recognition task as a span-based Global Pointer search process.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!