The linear order of elements in prominent linguistic sequences: Deriving Tns-Asp-Mood orders and Greenberg’s Universal 20 with n-grams
Stela Manova
July 2021
 

Current NLP research uses neither linguistically annotated corpora nor the traditional pipeline of linguistic modules, which raises questions about the future of linguistics. Linguists who have tried to crack the secrets of deep learning NLP models, including BERT (a bidirectional transformer-based ML technique employed for Google Search), have had as their ultimate goal to show that deep nets make linguistic generalizations. I decided for an alternative approach. To check whether it is possible to process natural language without grammar, I developed a very simple model, the End-to-end N-Gram Model (EteNGraM), that elaborates on the standard n-gram model. EteNGraM, at a very basic level, imitates current NLP research by handling semantic relations without semantics. Like in NLP, I pre-trained the model with the orders of the TAM markers in the verbal domain, fine-tuned it, and then applied it for derivation of Greenberg’s Universal 20 and its exceptions in the nominal domain. Although EteNGraM is ridiculously simple and operates only with bigrams and trigrams, it successfully derives and differentiates between the attested and unattested patterns in Cinque (2005) “Deriving Greenberg's Universal 20 and Its Exceptions”, Linguistic Inquiry 36, and Cinque (2014) “Again on Tense, Aspect, Mood Morpheme Order and the “Mirror Principle”.” In Functional Structure from Top to Toe: The Cartography of Syntactic Structures 9. EteNGraM also makes fine-grained predictions about preferred and dispreferred patterns across languages and reveals novel aspects of the organization of the verbal and nominal domain. To explain EteNGraM's highly efficient performance, I address issues such as: complexity of data versus complexity of analysis; structure building by linear sequences of elements and by hierarchical syntactic trees; and how linguists can contribute to NLP research.* DOI: 10.13140/RG.2.2.17715.55843
Format: [ pdf ]
Reference: lingbuzz/006082
(please use that when you cite this article)
Published in: Manuscript
keywords: nlp, bert, theoretical linguistics, typology, n-grams, deep learning, complexity, trees, tense-aspect-mood, greenberg's universal 20, semantics, morphology, syntax
previous versions: v2 [July 2021]
v1 [July 2021]
Downloaded:1319 times

 

[ edit this article | back to article list ]