How a simple increase in the number of items can enhance the reliability of judgment data: The case of island experiments
Gert-Jan Schoenmakers
March 2025
 

Replication is an important aspect of experimental research and it is therefore crucial that collected datasets are reliable. Reliability refers to the precision of measurement and thus informs the replicability of experiments: more precise measurements are more dependable for future reference. Formally defined as the ratio of true score variance to the total variance, reliability can be achieved by fine-tuning the instrument of measurement or by collecting a sufficiently large number of observations, as statistical models operating on larger sample sizes are in a better position to estimate the true score associated with a population. The present paper uses Generalizability Theory to explore the variation patterns in 52 distinct datasets from studies that used comparable experimental designs to investigate different types of island effects in different languages. Effect sizes (DD-scores) are commonly reported and used for comparative purposes in discussions on island effects. The present paper argues that caution is warranted when island effect sizes are compared: the analyses reveal that the reliability of datasets from island experiments is only moderate. At the same time, the analyses show that reliability can be improved relatively simply, namely, by increasing the number of items tested in the experiment to six per condition.
Format: [ pdf ]
Reference: lingbuzz/009059
(please use that when you cite this article)
Published in:
keywords: replication; reliability; judgment data; experimental syntax; island effects, syntax
previous versions: v1 [March 2025]
Downloaded:111 times

 

[ edit this article | back to article list ]