A Source System of Record (“SSoR”) is an information storage and retrieval system that is the authoritative source for a particular data element or piece of information in a system containing multiple sources of the same element. To ensure data integrity, there must be one—and only one—system of record for a given piece of information. Often, a large network with multiple information systems or sources may disagree about a data element or piece of information. These disagreements may stem from semantic differences, use of different sources, or may simply be the result of an error or bug, among other causes.
If there is no association with a reputable source, such as the SSoR, the integrity and validity of any piece of data can be suspect. Accordingly, maintaining SSoR is often a key requirement for Enterprise Search solutions, which assumes continuous data updates from one or more content authors, called the “push.” The original data submitted by the one or more content authors must be stored in SSoR to allow reprocessing without being dependent on the content authors. The ability to reprocess data is an essential requirement, for example, for systems which use taxonomy-based drill-down. If taxonomy is changed, affected documents have to be reprocessed to ensure taxonomy changes are reflected in the appropriate index fields.
This need to reprocess and update must be balanced with the need to use the same data to build an optimized index, such as a Search Index (SI) for the search frontend via one or more predefined rules. Often service-level agreements with content authors may define a certain maximum time for a document to reach a frontend index. Accordingly, it is important to ensure that the requirements of the service-level agreements are met even when the documents and/or metadata have to be recalculated from SSoR.
Existing enterprise search solutions do not allow reprocessing of data while accepting new push, update, and/or delete requests. Accordingly, these solutions block incoming requests whenever reprocessing occurs, usually by queueing them, or they queue updates via batches and process them one batch after another. This solution results in significant delays in content processing, which can interfere with efficiency, and in many cases can violate the terms of one or more service-level agreements which define a certain maximum time for a document to reach a frontend index.
Accordingly, there is a continued need in the art for systems and methods that allow on-demand updating of a search index from SSoR while simultaneously accepting push, update, and/or delete requests from content authors.