Technical Field
The present invention generally relates to evaluating difficulty of documents and particularly relates to a method, a system and a program product for calculation of difficulty of documents.
Description of the Related Art
Recently, on-line education has become popular and huge numbers of educational documents have been published and available on the on-line basis. Learners can select documents with suitable level for the learners and can study by using the selected documents. However, the huge number of educational documents available through a network makes it difficult to find the educational documents suitable for learners. This environment raises new requirements for providing a novel search technology for educational documents.
Several methods have been proposed to evaluate difficulties of documents. For example, Non-Patent Literature 1 (Smith, Dean R., et al. “The Lexile Scale in Theory and Practice. Final Report” technical report 1989, NIH Grant HD-19448, http://files.eric.ed.gov/fulltext/ED307577.pdf) discloses a three-part correlational study examining the explanatory power of the Lexile theory of reading comprehension, which is based on the semantic and syntactic components of prose. Also Non-Patent Literature 2 (Kondo, et al., “Difficulty Estimation of Japanese Text using Textbook Corpus”, 14th Annual Meeting, Association for Natural Language Processing, pp. 1113-1116, March 2008, http://must.c.u-tokyo.ac.jp/nlpann/pdf/nlp2008/D5-05.pdf) discloses an estimation of difficulty of Japanese text using a textbook corpus. Furthermore, Non-Patent Literature 3 (Nakayama, et al., “An Estimation of Academic Books using Reviews”, Japanese Society of Artificial Intelligence, March 2012, https://www.jstage.jst.go.jp/article/tjsai/27/3/27_3_213/_pdf) discloses a method for estimating difficulty of texts using review information provided by users.
Furthermore, Non-Patent Literature 4 (Flesch, Rudolph, “A new readability yardstick”, Journal of Applied Psychology, Vol. 32(3), June 1948, pp. 221-233) discloses a method using a superficial feature of documents called “Readable score”. Further, another strategy for estimating the difficulty of documents uses localities of words included therein without using labels of documents. Specifically, non-Patent Literature 5 (Nishihara, et al., “Information Acquiring Support System Based on Keyword Continuity and Informational Difficulty,” International Conference on Human-Computer Interaction, September 2005, http://www.panda.sys.t.u-tokyo.ac.jp/nishihara/pdf/hci2005.pdf) discloses a method for estimating difficulty of documents using keyword continuity.
Keywords that are highly localized are identified as keywords with high difficulty in documents. However, there may be keywords which do not affect the difficulty of the documents even if the keywords are highly localized, for example, a person's name referred in a textbook of statistics. As described above, many technologies for difficulty evaluation of documents has been proposed and known. Usability, such as wide applicability of various documents, consistency and accuracy was still insufficient and a novel technique has been continuously searched and developed.