The present invention, in some embodiments thereof, relates to semantic analysis and, more particularly, but not exclusively, to methods and systems of supervised learning of semantic relatedness.
In recent years, the problem of automatically determining semantic relatedness has been steadily gaining attention among statistical natural language processing (NLP) and artificial intelligence (AI) researchers. As used herein, semantic relatedness (SR) means semantic similarity, semantic distance, semantic relatedness, and/or a quantification of a relation between terms. This surge in semantic relatedness research has been reinforced by the emergence of applications that can greatly benefit from semantic relatedness capabilities, such as targeted advertising, content aggregation, content presentation, information retrieval, and web search, automatic tagging and linking, and text categorization.
With few exceptions, most of the algorithms proposed for SR valuation have been following an unsupervised learning and/or knowledge engineering procedures whereby semantic information is extracted from a (structured) background knowledge corpus using predefined formulas or procedures.
An example of a supervised SR learning is described in E. Agirre, E. Alfonseca, K. Hall, J. Kravalova, M. Pasca, and A. Soroa. A study on similarity and relatedness using distributional and wordnet-based approaches. In NAACL, pages 19-27, Morristown, N.J., USA, 2009. Association for Computational Linguistics, which is incorporated herein by reference. This publication teaches a classification which is based on determining which pair among two pairs of terms includes terms which are more related to each other. Each instance, consisting of two pairs {t1; t2} and {t3; t4}, is represented as a feature vector constructed using SR scores and ranks from unsupervised SR methods. Using support vector machine (SVM) this approached achieved 0.78 correlation with WordSimilarity-353 Test Collection, see Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin, “Placing Search in Context: The Concept Revisited”, ACM Transactions on Information Systems, 20(1):116-131, January 2002, which is incorporated herein by reference. The structure-free background knowledge used for achieving this result consisted of four billion web documents.