Code and data for the study "Entropy of morphological systems is modulated by functional and semantic properties" by F. Franzon (@franfranz) and C. Zanini (@chzani) - Article - Preprint.
This version includes the final dataset used in the study and the code for analysis and graph printing.
Datasets of ll nouns, animate sample, control sample (wd_in)
This version includes the code for analysis and the code to preprocess the data and obtain the dataset analyzed in the study.
- 00 EMS Full - Merge Text Resources
- 01 EMS Full - Compare Noun distributions
- 02 EMS Full - Compute Context Entropy
- 03 EMS Full - Context Entropy across Features
- 04 EMS Full - Print Graphs
- Language Resources used in the study (from Itwac and Morph-it![1;2]) (wd0).
- Merged dataset with all nouns, animate sample, control sample (wd1).
- Text files collecting the 10-word-windows surrounding the words in the animate and control sample, collected from the Itwac corpus (wd2). Please note that this folder only contains a mini sample of nouns. The full dataset is available on request
- Datasets of animate sample and control sample, with Context Entropy measures (wd3).
[1] Baroni, M., Bernardini, S., Ferraresi, A., and Zanchetta, E. (2009). The wacky wide web: a collectionof very large linguistically processed web-crawled corpora.Language resources and evaluation,43(3):209–226; [2] Zanchetta, E. and Baroni, M. (2005). Morph-it! a free corpus-based morphological resource for theitalian language.Corpus Linguistics, 1(1):2005