Languages optimize the trade-off between lexicon size and average utterance length: A case study of numeral systems
Milica Denić, Jakub Szymanik
August 2022

Human languages vary in terms of which meanings they lexicalize, but there are important constraints on this variation. It has been argued that languages are under pressure to be simple and to be informative, and that a good compromise between these two pressures determines which meanings get lexicalized (Kemp and Regier, 2012 and much subsequent work). We argue that the way informativeness is operationalized in that line of work is problematic because it assumes a communication model where interlocutors communicate with monomorphemic expressions. In reality, however, morphosyntactically complex expressions greatly enrich the range of meanings we are able to express in many semantic domains. One such domain is number: many languages lexicalize few number meanings as monomorphemic expressions, but can precisely convey any number meaning using morphosyntactically complex numerals. We argue that a different notion of communicative efficiency plays a role in which meanings get lexicalized in semantic domains such as number: languages are trying to find a good compromise between the pressure to lexicalize as few meanings as possible (i.e, to minimize lexicon size) and the pressure to produce as morphosyntactically simple utterances as possible (i.e., to minimize average utterance length). This case study in conjunction with previous work on communicative efficiency suggests that, in order to explain which meanings get lexicalized across languages and across semantic domains, a more general approach may be that languages are finding a good compromise between not two but three pressures: be simple, be informative, and minimize average utterance length.
keywords: numerals; number; simplicity; informativeness; average utterance length; trade-off, semantics
