Heaps' Law and Heaps functions in tagged texts: evidences of their linguistic relevance.

R Soc Open Sci

Centro Atómico Bariloche and Instituto Balseiro, Comisión Nacional de Energía Atómica and Universidad Nacional de Cuyo, Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Bustillo 9500, 8400 San Carlos de Bariloche, Pcia. de Río Negro, Argentina.

Published: March 2020

We study the relationship between vocabulary size and text length in a corpus of 75 literary works in English, authored by six writers, distinguishing between the contributions of three grammatical classes (or 'tags,' namely, , and ), and analyse the progressive appearance of new words of each tag along each individual text. We find that, as prescribed by Heaps' Law, vocabulary sizes and text lengths follow a well-defined power-law relation. Meanwhile, the appearance of new words in each text does not obey a power law, and is on the whole well described by the average of random shufflings of the text. Deviations from this average, however, are statistically significant and show systematic trends across the corpus. Specifically, we find that the appearance of new words along each text is predominantly retarded with respect to the average of random shufflings. Moreover, different tags add systematically distinct contributions to this tendency, with and being respectively more and less retarded than the mean trend, and following instead the overall mean. These statistical systematicities are likely to point to the existence of linguistically relevant information stored in the different variants of Heaps' Law, a feature that is still in need of extensive assessment.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7137977PMC
http://dx.doi.org/10.1098/rsos.200008DOI Listing

Publication Analysis

Top Keywords

heaps' law
12
appearance text
8
average random
8
random shufflings
8
text
6
law heaps
4
heaps functions
4
functions tagged
4
tagged texts
4
texts evidences
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!