1. Field of the Invention
Embodiments of the present invention pertain to method and device for searching a stream of discrete content objects for specific content patterns. In particular, embodiments of the present invention pertain to techniques to apply new searches, and successive alterations of existing searches, to both the live stream and to the historical archive of the stream's contents.
2. Discussion of Background Information
In business, there is considerable attention directed to, e.g., improving customer service and/or monitoring employees to ensure that the company's interests are not compromised. By way of non-limiting example, Tire Store A has an 800-number and a bank of 15 telephones. The telephones may be manned by operators or may be automated to simply play a prerecorded message and then record a caller's comments/criticisms for Tire Store A. However, it is advantageous for all incoming and/or outgoing calls to be recorded. As hundreds of calls a day may be received, it is very difficult, if not impossible for an individual or group of individuals to listen to each incoming call. Thus, businesses such as Tire Store A can employ or subscribe to a service that searches recorded calls for certain terms, e.g., dissatisfied, rude, “never shopping here again,” etc. so that these calls can be reviewed in detail to find and correct the problem.
There are currently several well-established technologies for processing an audio recording to analyze its spoken content, including phonetic and large vocabulary speech recognition models, by companies such as, e.g., Aurix, Nice Systems, Philips Electronics, Nuance. Similarly, existing technologies can search the contents of computer files, emails and other digital media. In each case, broadly speaking, an indexing engine of the known art converts a given content object, e.g., an audio recording, a video recording, email text, etc., into a form compatible with a specified search technology. The indexed content object can be stored in a content store, which is a virtual collection of indexed content objects that is stored in one or more memory or storage devices and that is associated with, e.g., a same user or subscriber of this technology. A separate search engine may then be tasked to search given content objects for a specified defined content, e.g., words or phrases of interest to the user or subscriber, based upon a set of search definitions. These known search engines' operations fall into two categories: those which apply a given set of search definitions to a fixed content store; and those which apply a given set of search definitions to newly received additions to a content store.
The former category is usually associated with large stores, e.g., thousands of content objects, which can require a significant amount of processing power. Thus, such searches are often run via a batch process. The processing strategy here is very simple: every search definition is applied to every member of the content store, e.g., each search definition is successively applied to all content objects in the object store or all search definitions are successively applied to each content object in the object store.
The latter category, since it searches only the newly arrived content stores, is much less processor-intensive, i.e., upon receipt of a new content object to the content store, the search definitions are applied to the new content object. However, in the event the user or subscriber want to perform new searches and/or make changes to existing searches, e.g., adding or removing terms or phrases, the new/changed searches will only be applied to those content objects received after the time that the new/changed searches are made effective, i.e., the new/changed searches will not be applied to the previously search content objects. The processing strategy here is more complex, typically requiring a software agent to decide which search definitions, if any, to evaluate for a given newly arrived content object. Software agents like this, which mediate the interaction between a computer system's data stores and its user interactions, are commonly referred to as “middleware”.
In order to apply new searches, and successive alterations of existing searches, to both the live stream and to the historical archive of the stream's contents, a significant modification of the software middleware would be required: in addition to deciding which search definitions to evaluate for a given newly arrived content object, it may also be necessary to decide which content objects to evaluate for a newly activated search definition. Further, since the latter activity could generate very large processing loads, i.e., searching large numbers of content objects, this new middleware must also be able to equitably allocate processing resources across search requests.