The ability to quickly search and find relevant information in a great amount of possibly unrelated or superfluous information is of increasing importance as the universe of information that can accessed by computer continues to increase dramatically. A search query can generate results that may obscure the actually desired information in a mountain of other material, some or which may be less relevant, or not relevant at all, or nonsensical, even when the information resulting from the search or match attempt actually corresponds to the explicit wording of a search query. And, the results of a search or an attempt to match may not even include the information that is actually desired.
In more general terms, when a goal is to match one textual item or collection of items of textual information with another item or collection of items of information, the result may not be sufficiently specific in that the result may include too many matches that are not truly relevant and/or may omit matches that actually are relevant and desired. Internet searches are a prime and common example but the problem is glaring in many other fields.
One approach to seeking a working solution to that problem is described in said U.S. Pat. No. 6,199,067 incorporated by reference, and involves the use of particular linguistic patterns and their frequency of occurrence, extracted from textual passages provided by a user (in addition to the words of a search query) and stored in a user profile data file representative of the user's overall linguistic patterns and the frequencies of occurrence thereof.
The term “frequency of occurrence” corresponds in meaning to the term “weight” regarding predicative phrases, as used in patent application Ser. No. 12/878,675 filed Sep. 9, 2010 and Ser. No. 13/324,192 filed on Dec. 13, 2011, both incorporated by reference in this patent specification.These linguistic patterns are stored in a computer-implemented user profile data file representative of the user's overall linguistic patterns and the frequencies of occurrence thereof; where the term linguistic patterns refers to the predicative definitions and predicative phrases of the type discussed in this patent specification.A passage in this context can be any suitable amount of text that can be treated as a paragraph, and may actually be a paragraph.A paragraph can be a subdivision of a written composition that comprises one or more sentences, deals with one or more points/ideas, or gives the words of one speaker by way of example, and can be extracted from text based upon textual indicators such as, for example, a hard return or a tab, although other suitable means, or indicators or algorithms can be used.Documents that the user may search are in a database, or retrievable in an Internet, and are likewise analyzed and their linguistic patterns and pattern frequencies are also extracted and stored in corresponding document profiles.More recently, a system and a method have been developed to eliminate or at least significantly reduce lexical noise, as described in said application Ser. No. 12/878,675, which is incorporated by reference. When a user initiates a search for particular data, linguistic patterns are also extracted from the search string, lexical noise is typically similarly eliminated or at least reduced significantly from the search string's linguistic patterns, and the linguistic patterns remaining after such processing are placed in a search profile.The documents from the database are retrieved and the user profile is then cross-matched with the found document profiles to determine the degree of match based on summation of respective frequencies of occurrence of the matching patterns. The documents with document profiles having the highest degrees of matching are presented to the user as correlated by their sense contextually.
Improved systems and methods are disclosed in said co-pending application Ser. No. 12/714,980 filed Mar. 1, 2010 and incorporated by reference, and involve making use of textual information regarding creation of users' profiles. Other improved systems and methods are disclosed in said co-pending application Ser. No. 12/878,675 filed Sep. 9, 2010 and also incorporated by reference, and involve preferred ways of deletion of lexical noise when creating profiles of texts. Last, said related provisional application 61/433,875, also incorporated by reference, and patent application Ser. No. 13/324,192 that claims benefit thereto, describe further improvements in which lexical noise may be deleted from search queries.
It is believed that a need still remains to improve searching and matching by sense, particularly when vast amounts of information are involved such as in Internet searches, common databases (such as DB2, Oracle, etc.), social network systems and other fields, in which explicit contextual information defining a search query or a desired match may not in itself be sufficient to provide highly efficacious processing and results.