Technical Field
The present disclosure relates to information technology, and, more particularly, to natural language processing (NLP) systems.
Discussion of Related Art
News agencies, bloggers, twitters, scientific journals and conferences, all produce extremely large amounts of unstructured data in textual, audio, and video form. Large amounts of such unstructured data and information can be gathered from multiple modalities in multiple languages, e.g., internet text, audio, and video sources. There is a need for analyzing the information and producing a compact representation of: 1) information such as actions of specific entities (e.g., persons, organizations, countries); 2) activities (e.g., the presidential election campaign); and 3) events (e.g., the death of a celebrity). Currently, such representations can be produced manually, but this solution is not cost effective and it requires skilled workers especially when the information is gathered from multiple languages. Such manually produced representations are also generally not scaleable.