A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent disclosure, as it appears in the Patent and Trademark Office patent files or documents, but otherwise reserves all copyright rights whatsoever.
1. Field of the Invention
The present invention relates generally to electronic data storage and retrieval. More particularly, the present invention relates to data parameterization, indexing technology and use of search indexes to search and retrieve data from data storage.
2. Description of Related Art
Electronic data/document storage and retrieval applications are relatively common. In fact, the Internet revolution has resulted in ever larger amounts of data being stored and retrieved using various application software, including database software, search engines, and browsers. Despite this massive increase in the amount of data available to be accessed, as technology advances consumers continue to demand faster and more accurate ways to access to that data.
Currently, every organization that attempts to develop and maintain an electronic information system today is faced with a significant challenge. It is widely known that 90% of the world""s information is stored in the form of e-mails, faxes, reports and word processing documents. The remaining 10% is stored in spreadsheets and databases. The 90% portion is unstructured and chaotic. This unstructured data cannot be rapidly and accurately searched and retrieved using traditional indexing and searching methods, based either on the row-and-column format of spreadsheets and databases, or the keyword format currently used to search unstructured data collections.
Row-and-column format databases are an effective means for storing, searching and retrieving structured data. This structured data is typically represented as a series of records, each record containing several fields into which the actual data is written. Since every data item has associated with it a field name, and usually a specific data format (i.e. numeric, Boolean, text string, etc.), it is a relatively simple matter to create indexes of the values contained in one or more of the fields of the database. It is likewise relatively simple to search such databases using the indexes. However, this method does not work well with unstructured data, since such data is not easily capable of being modeled using the row-and-column format.
Currently, the favored method for searching unstructured data is by conducting a keyword search. In a keyword search, a user will provide one or more words that the user believes will be found within the text of the data items the user considers relevant, yet will not be found within the text of the data items the user considers irrelevant. More advanced keyword techniques allow the user to specify relations between the keywords, such as specifying that a pair of keywords must be located within the same sentence or paragraph, or within a specified number of words of each other.
Even with these techniques, however, keyword searches are still rather imprecise, and the user still is frequently presented with a significant amount of irrelevant data items. Additionally, relevant data is often not retrieved, because the keyword combination specified is different from the keyword combination in the data items to be searched. Thus, users are forced to waste time, both in reviewing all of the data items retrieved to determine which ones are relevant and in running multiple searches with variations on the keywords, to insure that no relevant data has been missed. Furthermore, these keyword searches are rather slow, since they must search the entire text of the documents to find keyword matches.
Thus, systems and methods are desirable for parameterizing, indexing and searching unstructured and semi-structured data more rapidly and accurately.
The present invention provides systems and methods for data storage and retrieval in which data is stored in a file storage system, data is associated with keywords, and desired data are identified and/or selected by conducting searches of indexes. The indexes map search criteria into the appropriate data.
In an aspect of a preferred embodiment of the invention, user-defined parameters of the data in the file storage system are created, values are associated with the parameters and each parameter-value pair is stored as a contiguous text string.
In another aspect of a preferred embodiment of the invention, different data items within the file storage system can have different parameters.
In another aspect of a preferred embodiment of the invention, the parameter-value pairs provide structure to unstructured data, creating semi-structured data.
In another aspect of a preferred embodiment of the invention, index entries are identified by comparing a search criterion to a parameter-value pair using a Boolean comparison of two text strings.
In another aspect of a preferred embodiment of the invention, an index of parameter-value pairs is used to translate semi-structured data into structured data.