Field of the Invention
The present invention relates generally to information processing, and more particularly to mapping user models to an inverted data index for retrieval, filtering and recommendation.
Related Art
An inverted index is an index data structure that stores a mapping from contents in a document to the positions of the contents in that document. The term “document” refers to whatever units a retrieval system is built over and is to be broadly interpreted to include any machine-readable and machine-storable work product. For example, a document may be a file containing words, numbers or symbols, a text document such as a memo or chapter of a book, and the like. The files may be of any type, such as text, audio, image, video, etc. A set of documents over which a retrieval is performed is referred to as a “collection”, “document collection”, “corpus” or “body” of documents. The set of distinct terms (also referred to as “tokens”) occurring in the collection is its “dictionary” (also referred to as “vocabulary” or “lexicon”).
For each term in the dictionary a so-called “postings list” records which document the term occurs in as well as the positions of that term in the document. Each postings list in a typical implementation contains the number and positions of term occurrences in the document, for each document in which the term occurs. A query term, typically a word, can be mapped to a postings list to identify the documents that contain that term.
It is generally known that inverted indices can be used to provide fast full text searches, but they are slow to update, for example, when a document is added to the database, deleted from it, or otherwise updated.
A relational database is a collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the tables. A typical relational database employs a set of tables containing data fitted into predefined categories. Each table (which is sometimes called a relation) contains one or more data categories in columns. The standard user and application program interface to a relational database is the Structured Query Language (SQL).
A relational database management system (RDBMS) is a database management system that manages relational databases and is capable of storing and retrieving large volumes of data. Further, large scale relational database management systems can be implemented to support thousands of users accessing databases via a wide assortment of applications. An RDBMS can be structured to support a variety of different types of operations for a requesting entity (e.g., an application, the operating system or an end user). Such operations can be configured to retrieve, add, modify and delete data being stored and managed by the RDBMS. Standard database access methods support these operations using high-level query languages, such as SQL. One of the primary operations performed with SQL is querying (also referred to herein as retrieving or selecting) data from data structures within a database.
One technical challenge involves merging data in a relational database with the data in an inverted index. Another technical challenge involves bridging a catalog record stored in a relational database to at least one inverted index for the purpose of retrieval, filtering and recommendation.