Text Reuse at Scale. An interface for the exploration of text reuse data in semantically enriched historical newspapers.

Marten Düring Matteo Romanello Maud Ehrmann Kaspar Beelen Daniele Guido Brecht Deseure Estelle Bunout Jana Keck Petros Apostolopoulos

Front Big Data

Digital History & Historiography, Luxembourg Centre for Contemporary and Digital History, Esch-sur-Alzette, Luxembourg.

Published: November 2023

Text Reuse reveals meaningful reiterations of text in large corpora. Humanities researchers use text reuse to study, e.g., the posterior reception of influential texts or to reveal evolving publication practices of historical media. This research is often supported by interactive visualizations which highlight relations and differences between text segments. In this paper, we build on earlier work in this domain. We present Text Reuse at Scale, the to our knowledge first interface which integrates text reuse data with other forms of semantic enrichment to enable a versatile and scalable exploration of intertextual relations in historical newspaper corpora. The Text Reuse at Scale interface was developed as part of the project and combines powerful search and filter operations with close and distant reading perspectives. We integrate text reuse data with enrichments derived from topic modeling, named entity recognition and classification, language and document type detection as well as a rich set of newspaper metadata. We report on historical research objectives and common user tasks for the analysis of historical text reuse data and present the prototype interface together with the results of a user evaluation.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654985	PMC
http://dx.doi.org/10.3389/fdata.2023.1249469	DOI Listing

Publication Analysis

Top Keywords

text reuse

reuse data

reuse scale

text

scale interface

reuse

historical

interface

interface exploration

exploration text

Similar Publications

Photochromic Film Based on a Mixed-Heteroatom-Templated Er-Incorporated Polyoxometalate Used for Inkless Prints and Anti-Counterfeiting.

ACS Appl Mater Interfaces

December 2024

Henan Key Laboratory of Polyoxometalate Chemistry, College of Chemistry and Molecular Sciences, Henan University, Kaifeng, Henan 475004, China.

Yanan Chen Tiantian Gong Qiuxia Han Jiancai Liu Lijuan Chen

Photochromic films have attracted increasing attention for their potential in information storage and photoswitchable devices. Developing novel photochromic materials is still a highly challenging topic. In this work, we successfully obtained an unprecedented mixed-heteroatom-templated Er-incorporated polyoxometalate [HN(CH)]NaKH{Er(HO)WO[SeTeWO][B-α-TeWO]}·117HO () by concurrently introducing the [TeO] and [SeO] heteroanion templates into the Er/WO system.

View Article and Find Full Text PDF

Similar Publications

Annotating publicly-available samples and studies using interpretable modeling of unstructured metadata.

Brief Bioinform

November 2024

Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States.

Hao Yuan Parker Hicks Mansooreh Ahmadian Kayla A Johnson Lydia Valtadoros

Reusing massive collections of publicly available biomedical data can significantly impact knowledge discovery. However, these public samples and studies are typically described using unstructured plain text, hindering the findability and further reuse of the data. To combat this problem, we propose txt2onto 2.

View Article and Find Full Text PDF

Similar Publications

Toward clearer recognition and easier usefulness: development of a cross-lingual atherosclerotic cerebrovascular disease ontology.

Database (Oxford)

December 2024

Intelligent Computing Department, Institute of Medical Information & Library, Chinese Academy of Medical Sciences/Peking Union Medical College, No. 3 Yabao Road, Beijing 100020, China.

Hetong Ma Liu Shen Jiayang Wang Shilong Wang Min Wang

Atherosclerotic cerebrovascular disease could result in a great number of deaths and disabilities. However, it did not acquire enough attention. Less information, statistics, or data on the disease has been revealed.

View Article and Find Full Text PDF

Similar Publications

The text2term tool to map free-text descriptions of biomedical terms to ontologies.

Database (Oxford)

November 2024

Center for Computational Biomedicine, Harvard Medical School, 10 Shattuck St, Boston, MA 02115, United States.

Rafael S Gonçalves Jason Payne Amelia Tan Carmen Benitez Jamie Haddock

There is an ongoing need for scalable tools to aid researchers in both retrospective and prospective standardization of discrete entity types-such as disease names, cell types, or chemicals-that are used in metadata associated with biomedical data. When metadata are not well-structured or precise, the associated data are harder to find and are often burdensome to reuse, analyze, or integrate with other datasets due to the upfront curation effort required to make the data usable-typically through retrospective standardization and cleaning of the (meta)data. With the goal of facilitating the task of standardizing metadata-either in bulk or in a one-by-one fashion, e.

View Article and Find Full Text PDF

Similar Publications

Research on a Joint Extraction Method of Track Circuit Entities and Relations Integrating Global Pointer and Tensor Learning.

Sensors (Basel)

November 2024

School of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China.

Yanrui Chen Guangwu Chen Peng Li

To address the issue of efficiently reusing the massive amount of unstructured knowledge generated during the handling of track circuit equipment faults and to automate the construction of knowledge graphs in the railway maintenance domain, it is crucial to leverage knowledge extraction techniques to efficiently extract relational triplets from fault maintenance text data. Given the current lag in joint extraction technology within the railway domain and the inefficiency in resource utilization, this paper proposes a joint extraction model for track circuit entities and relations, integrating Global Pointer and tensor learning. Taking into account the associative characteristics of semantic relations, the nesting of domain-specific terms in the railway sector, and semantic diversity, this research views the relation extraction task as a tensor learning process and the entity recognition task as a span-based Global Pointer search process.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!