Text Reuse reveals meaningful reiterations of text in large corpora. Humanities researchers use text reuse to study, e.g., the posterior reception of influential texts or to reveal evolving publication practices of historical media. This research is often supported by interactive visualizations which highlight relations and differences between text segments. In this paper, we build on earlier work in this domain. We present Text Reuse at Scale, the to our knowledge first interface which integrates text reuse data with other forms of semantic enrichment to enable a versatile and scalable exploration of intertextual relations in historical newspaper corpora. The Text Reuse at Scale interface was developed as part of the project and combines powerful search and filter operations with close and distant reading perspectives. We integrate text reuse data with enrichments derived from topic modeling, named entity recognition and classification, language and document type detection as well as a rich set of newspaper metadata. We report on historical research objectives and common user tasks for the analysis of historical text reuse data and present the prototype interface together with the results of a user evaluation.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654985PMC
http://dx.doi.org/10.3389/fdata.2023.1249469DOI Listing

Publication Analysis

Top Keywords

text reuse
36
reuse data
16
reuse scale
12
text
11
scale interface
8
reuse
8
historical
5
interface
4
interface exploration
4
exploration text
4

Similar Publications

Photochromic Film Based on a Mixed-Heteroatom-Templated Er-Incorporated Polyoxometalate Used for Inkless Prints and Anti-Counterfeiting.

ACS Appl Mater Interfaces

December 2024

Henan Key Laboratory of Polyoxometalate Chemistry, College of Chemistry and Molecular Sciences, Henan University, Kaifeng, Henan 475004, China.

Photochromic films have attracted increasing attention for their potential in information storage and photoswitchable devices. Developing novel photochromic materials is still a highly challenging topic. In this work, we successfully obtained an unprecedented mixed-heteroatom-templated Er-incorporated polyoxometalate [HN(CH)]NaKH{Er(HO)WO[SeTeWO][B-α-TeWO]}·117HO () by concurrently introducing the [TeO] and [SeO] heteroanion templates into the Er/WO system.

View Article and Find Full Text PDF

Reusing massive collections of publicly available biomedical data can significantly impact knowledge discovery. However, these public samples and studies are typically described using unstructured plain text, hindering the findability and further reuse of the data. To combat this problem, we propose txt2onto 2.

View Article and Find Full Text PDF

Atherosclerotic cerebrovascular disease could result in a great number of deaths and disabilities. However, it did not acquire enough attention. Less information, statistics, or data on the disease has been revealed.

View Article and Find Full Text PDF

The text2term tool to map free-text descriptions of biomedical terms to ontologies.

Database (Oxford)

November 2024

Center for Computational Biomedicine, Harvard Medical School, 10 Shattuck St, Boston, MA 02115, United States.

There is an ongoing need for scalable tools to aid researchers in both retrospective and prospective standardization of discrete entity types-such as disease names, cell types, or chemicals-that are used in metadata associated with biomedical data. When metadata are not well-structured or precise, the associated data are harder to find and are often burdensome to reuse, analyze, or integrate with other datasets due to the upfront curation effort required to make the data usable-typically through retrospective standardization and cleaning of the (meta)data. With the goal of facilitating the task of standardizing metadata-either in bulk or in a one-by-one fashion, e.

View Article and Find Full Text PDF

To address the issue of efficiently reusing the massive amount of unstructured knowledge generated during the handling of track circuit equipment faults and to automate the construction of knowledge graphs in the railway maintenance domain, it is crucial to leverage knowledge extraction techniques to efficiently extract relational triplets from fault maintenance text data. Given the current lag in joint extraction technology within the railway domain and the inefficiency in resource utilization, this paper proposes a joint extraction model for track circuit entities and relations, integrating Global Pointer and tensor learning. Taking into account the associative characteristics of semantic relations, the nesting of domain-specific terms in the railway sector, and semantic diversity, this research views the relation extraction task as a tensor learning process and the entity recognition task as a span-based Global Pointer search process.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!