types2 is a freely available corpus tool for comparing the frequencies of words, types, and hapax legomena across subcorpora. The tool uses accumulation curves and the statistical technique of permutation testing to compare the subcorpora with a “typical” corpus of a similar size, in order to visualize the frequencies and to identify statistically significant findings.
The software is written by Jukka Suomela, and the system is designed and developed in collaboration with Tanja Säily. The sample data sets are provided by Tanja Säily. Please see the paper “types2: Exploring word-frequency differences in corpora” for more information on how to use the tool.
db/types.sqlitewith your input data
bin/types-runto perform data analysis
bin/types-webto create the web user interface
web/index.htmlin web browser
git clone https://github.com/suomela/types.git git clone https://github.com/suomela/types-examples.git cd types ./config make mkdir db cp ../types-examples/bnc-input/db/types.sqlite db/types.sqlite bin/types-run --citer=100000 --piter=100000 bin/types-web open web/index.html