Languages optimize the trade-off between lexicon size and average utterance length: A case study of numeral systems
Milica Denić, Jakub Szymanik
February 2024

Human languages vary in terms of which meanings they lexicalize, but there are important constraints on this variation. It has been argued that languages are under two competing pressures: the pressure to be simple (e.g., to have a small lexicon size) and to allow for an informative (i.e., precise) communication with their lexical items, and that which meanings get lexicalized may be explained by languages finding a good way to trade off between these two pressures (Kemp and Regier, 2012 and much subsequent work). However, in certain semantic domains, it is possible to reach very high levels of informativeness even if very few meanings from that domain are lexicalized. This is due to productive morphosyntax, which may allow for construction of meanings which are not lexicalized. Consider the semantic domain of natural numbers: many languages lexicalize few natural number meanings as monomorphemic expressions, but can precisely convey any natural number meaning using morphosyntactically complex numerals. In such semantic domains, lexicon size is not in direct competition with informativeness. What explains which meanings are lexicalized in such semantic domains? We will argue that in such cases, languages are (near-)optimal solutions to a different kind of trade-off problem: the trade-off between the pressure to lexicalize as few meanings as possible (i.e, to minimize lexicon size) and the pressure to produce as morphosyntactically simple utterances as possible (i.e, to minimize average morphosyntactic complexity of utterances). This study in conjunction with previous work on communicative efficiency suggests that, in order to explain which meanings get lexicalized across languages and across semantic domains, a more general approach may be that languages are finding a good way to trade off between not two but three pressures: be simple, be informative, and minimize average morphosyntactic complexity of utterances. [Note: the title of the paper has been updated to 'Recursive numeral systems optimize the trade-off between lexicon size and average morphosyntactic complexity'.]
Format: [ pdf ]
Reference: lingbuzz/006748
(please use that when you cite this article)
Published in: Cognitive Science (in press)
keywords: numerals; number; simplicity; informativeness; average utterance length; trade-off, semantics
previous versions: v3 [November 2023]
v2 [January 2023]
v1 [August 2022]
Downloaded:956 times


[ edit this article | back to article list ]