Mari-Sanna Paukkeri, Marja Ollikainen, and Timo Honkela. Assessing user-specific difficulty of documents. Information Processing & Management, 49(1):198–212, 2013.


On the Web, a huge variety of text collections contain knowledge in different expertise domains, such as technology or medicine. The texts are written for different uses and thus for people having different levels of expertise on the domain. Texts intended for professionals may not be understandable at all by a lay person, and texts for lay people may not contain all the detailed information needed by a professional. Many information retrieval applications, such as search engines, would offer better user experience if they were able to select the text sources that best fit the expertise level of the user. In this article, we propose a novel approach for assessing the difficulty level of a document: our language-independent method assesses difficulty for each user separately. The method enables, for instance, offering information in a personalised manner based on the user's knowledge of different domains. The method is based on the comparison of terms appearing in a document and terms known by the user. We present two ways to collect information about the terminology the user knows: by directly asking the users the difficulty of terms or, as a novel automatic approach, indirectly by analysing texts written by the users. We examine the applicability of the methodology with text documents in the medical domain. The results show that the method is able to distinguish between documents written for lay people and documents written for experts.

Suggested BibTeX entry:

    author = {Mari-Sanna Paukkeri and Marja Ollikainen and Timo Honkela},
    journal = {Information Processing \& Management},
    language = {eng},
    number = {1},
    pages = {198-212},
    title = {Assessing user-specific difficulty of documents},
    volume = {49},
    year = {2013},

See ...