Our search for digital knowledge began several decades ago when the idea of digitizing media was commonplace, yet books were still the primary medium for storing knowledge. Before the field of multimedia information retrieval coalesced into a scientific community, there were many contributory advances from a wide set of established scientific fields. From a theoretical perspective, areas such as artificial intelligence, computational vision, pattern recognition and knowledge management contributed significantly to the underlying mathematical foundation of MIR. Psychology and related areas such as aesthetics and ergonomics provided basic foundations for the interaction with the user. Further, applications of pictorial search into a database of imagery already existed in niche forms; such as face recognition, robotic guidance, and character recognition. Applications of text retrieval associated with multimedia were largely based on keywords.
The two fundamental necessities for a multimedia information retrieval system are: (1) searching for a particular media item, and (2) browsing and summarizing a media collection. In searching for a particular media item, the current systems have significant limitations. No credible representative real-world test sets exist for evaluation, or even benchmarking measures, which are clearly correlated with user satisfaction. In general, current systems have not yet had significant impact on society, due to an inability to bridge the semantic gap between computers and humans.
Taking the above into account, there clearly remains a need, in the fields of Language Processing and Multimedia Information Retrieval, for systems apparatuses circuits methods and associated computer executable codes that introduce unique approaches to semantic understanding, such as an ability to understand domain-specific concepts, realized by a wide user vocabulary, transform non-structured natural speech or written text to structured queries (a task sometimes referred to as NLU or NLP) and estimate the user's satisfaction level resulting from the exposure to a particular media item(s).