Comparing type counts: The case of women, men and -ity in early English letters

Säily, Tanja; Suomela, Jukka

doi:10.1163/9789042025981_007

Comparing type counts: The case of women, men and -ity in early English letters

ICAME 2007 · 28th Annual Conference of the International Computer Archive for Modern and Medieval English, Stratford-upon-Avon, UK, May 2007 · doi:10.1163/9789042025981_007

authors’ version publisher’s version

Abstract

This work is a case study of applying nonparametric statistical methods to corpus data. We show how to use ideas from permutation testing to answer linguistic questions related to morphological productivity and type richness. In particular, we study the use of the suffixes -ity and -ness in the 17th-century part of the Corpus of Early English Correspondence within the framework of historical sociolinguistics. Our hypothesis is that the productivity of -ity, as measured by type counts, is significantly low in letters written by women. To test such hypotheses, and to facilitate exploratory data analysis, we take the approach of computing accumulation curves for types and hapax legomena. We have developed an open source computer program which uses Monte Carlo sampling to compute the upper and lower bounds of these curves for one or more levels of statistical significance. By comparing the type accumulation from women’s letters with the bounds, we are able to confirm our hypothesis.

Publication

Antoinette Renouf and Andrew Kehoe (Eds.): Corpus Linguistics: Refinements and Reassessments, volume 69 of Language and Computers – Studies in Practical Linguistics, pages 87–109, Rodopi, Amsterdam, 2009

ISBN 978-90-420-2597-4
ISSN 0921-5034

Links

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.