The invention is a new electronic search tool for the internet, a wide area network, or even a local file server. According to the method disclosed herein, files are located and retrieved via vector-based mathematics by comparing vectors associated with words in a search string to other vectors assigned to words in a stored file (e.g., a web page). The invention encompasses several aspects generally related to retrieving desired files by assigning words within the files to unique points in space, wherein each point is defined by a mathematical expression. An overall mapping of vectors extends from a common origin to each point in space that represents a word. The method and computerized search tool disclosed herein determines whether the vectors are sufficiently proximate in space to show a match between the file and the search string.
The method of assigning words to particular points in space is useful in doing all types of automated (electronic) searches, including but not limited to searches on the internet, on business servers, or on a personal computer. Electronically matching words in a search string to words located on a particular point in space is conveniently calculated by using vector mathematics. That is, when vectors assigned to each word in a search string match vectors for words located on the n-dimensional object in space, the computer can quickly return the correct result. Vector mathematics uses quicker and easier computations than traditional text searches that most systems use today.
One of the most useful applications of file search tools is the Internet. Search engines known in the art use a crawler to search for pertinent descriptive information about a Website and to score it for applicability to the user's search. The descriptive information about the site is created by the site owner and may be valid or invalid as it is purely arbitrary and controlled by the person creating the website. The crawlers have no way to assess the veracity of the descriptors and lead to unexpected search results.
Crawlers from a variety of search engines retrieve descriptive information about a site and attempt to glean the value or rank of the site in terms of its usefulness to the searcher. The rank of the site is completely up to the algorithm of the software which varies from engine to engine.
Problems that exist with current technology include:                (1) Incomplete descriptors about the page        (2) False descriptors about the page        (3) Too many descriptors trying to cover too much ground        (4) Using hyperlinks located within a file to establish validity or rankThe result is that users may end up with thousands of web sites that are ranked in order of importance by search engine criteria. The searcher, however, has no way to control the number or usefulness of search results. A need exists in the area of search engine development for faster and more accurate search results that avoid the above-enumerated problems. One method of doing so is by using vector mathematics in the search process.        
Prior efforts to utilize vector mathematics in search engines have been outlined in part within previous publications as follows:
U.S. Patent Application No. 2003/0120639 (Potok), now U.S. Pat. No. 7,072,883, uses a vector space model to store documents for internet searching. The Potok vectors, however, are not related to any particular point in space on which individual words are located. Instead, Potok uses a method by which each unique word in a collection of documents represents a dimension in space, and each document in this space is represented by a vector. Potok continues to count distinct words in each document and weighting the words according to their frequency of occurrence in a set of documents. The vector representing a particular document is determined by its composite weight of certain words. In the Potok method, vector mathematics is used to compare documents in which individual words have been counted.
U.S. Pat. No. 6,684,205 (Modha) discloses a method of organizing data, such as documents on the web, by clustering similar documents into the same group. Modha represents each document in the cluster as a triplet of unit vectors. The first vector represents the words contained in the document. The second vector represents outward bound links to other documents from the document of interest. The third vector represents in-links to the document of interest. Each document expressed by the triplet of vectors lies on a torus in space. Modha uses vector analysis to compare documents, assess their similarity, and cluster the documents accordingly.
U.S. Pat. No. 6,910,037 (Gutta) identifies a system and method for processing search results on the internet. Gutta does not provide a new way of searching per se. After a search has been performed in a traditional term search, Gutta analyzes the resulting document set to determine the term frequencies in each document. Each term in each document can be expressed by a vector with a value based on term frequency. Each document, therefore, is expressed as a resultant vector of the individual term vectors that make up that document. Gutta uses vector mathematics to compare the resultant vectors of each document and organize the documents accordingly. This grouping is known in the art as “clustering” the similar documents together. In this way, Gutta can display like results together as a group. Gutta still relies upon traditional searching by analyzing and matching individual characters and words within each document to achieve a search result. Gutta's goal is only to cluster the search results for displaying to the user.
A need still exists in the art of electronic file searching to provide access to search results faster and more accurately by analyzing individual electronic files on a more fundamental level. The invention herein meets this need.