The Entropy of Digital Texts-The Mathematical Background of Correctness.

Entropy (Basel)

Faculty of Informatics, University of Debrecen, Kassai út 26., 4028 Debrecen, Hungary.

Published: February 2023

Based on Shannon's communication theory, in the present paper, we provide the theoretical background to finding an objective measurement-the text-entropy-that can describe the quality of digital natural language documents handled with word processors. The text-entropy can be calculated from the formatting, correction, and modification entropy, and based on these values, we are able to tell how correct or how erroneous digital text-based documents are. To present how the theory can be applied to real-world texts, for the present study, three erroneous MS Word documents were selected. With these examples, we can demonstrate how to build their correcting, formatting, and modification algorithms, to calculate the time spent on modification and the entropy of the completed tasks, in both the original erroneous and the corrected documents. In general, it was found that using and modifying properly edited and formatted digital texts requires less or an equal number of knowledge-items. In information theory, it means that less data must be put on the communication channel than in the case of erroneous documents. The analysis also revealed that in the corrected documents not only the quantity of the data is less, but the quality of the data (knowledge pieces) is higher. As the consequence of these two findings, it is proven that the modification time of erroneous documents is severalfold of the correct ones, even in the case of minimal first level actions. It is also proven that to avoid the repetition of the time- and resource-consuming actions, we must correct the documents before their modification.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955509PMC
http://dx.doi.org/10.3390/e25020302DOI Listing

Publication Analysis

Top Keywords

documents
8
modification entropy
8
corrected documents
8
erroneous documents
8
modification
5
erroneous
5
entropy digital
4
digital texts-the
4
texts-the mathematical
4
mathematical background
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!