The present invention relates generally to a data acquisition and perusal system and method for locating, indexing, and accessing information, and more particularly to a data acquisition and perusal system and method for acquiring, creating, manipulating, indexing, and perusing data, and to a method and system for locating and retrieving known or unknown data for the same purposes.
Computers were intended to provide an effective and efficient way for humans to manage, locate, peruse and manipulate data or objects. For example, a first, basic system and method is that demonstrated by modern word processor applications which have some search and text access capabilities; however, as far as is known, they are limited to the current file that is open. Employing this method, the user can request the location of a word in the text. Within an individual file, the computer will then take the user sequentially to each location of that text. Only string searches are allowed. By repeatedly running the search, the user can sequentially move from result to result. While it might be possible to open, many files simultaneously, the available resources and memory make this impractical.
A second, improved system and method enabled by some computer operating systems include applications that allow users to search all available files, accessible by certain software applications, for words or simple phrases. They still require the user to open each of the files of interest in a word processor, viewer or other application referred to in the first system and method to access the data. The search time required is relatively great because the data available has to be sequentially read and compared with the query.
A third system and method used by software applications provides improved search capabilities and is commonly known as a xe2x80x9csearch/retrieval enginexe2x80x9d. Among other things, search/retrieval engines can essentially search and access many thousands of files simultaneously and very quickly by using pre-generated indexes of the data. For example, a user can query an encyclopedia converted to an indexed database, and by the use of highlighted text, quickly determine every place a word or phrase occurs in the text, and have the ability to instantly view those occurrences as desired. These products even take the user sequentially to each incident of highlighted text or xe2x80x9chit.xe2x80x9d The computer can then take the user from hit to hit.
Converting a database like an encyclopedia into a format useable by a search/retrieval engine is not simply a matter of converting its volumes into electronic files accessible by the user""s computer. For efficient search performance, the contents of the files are logically indexed as to location, frequency, etc. The search functions of the engine actually search the index to determine if the query criteria are met, and then the locations of valid results are passed to the retrieval functions to display them. Without a well-designed index, a computer could take a long time to perform a search for a simple phrase that can otherwise be performed in a fraction of a second. Some search/retrieval engine application vendors allow users to generate indexes for their own files through an indexing utility, and others intend for indexing to be done only by electronic database publishers by use of a separate application designed for that purpose.
Currently, a user desiring to employ the speed of a computer to search for and retrieve data from multiple disparate source files generally has three choices: (1) use the basic first system and method above to open each file in a word processor application and search them individually; (2) use the second system and method above, search each file using an operating system application, and then open each file in the list of results in a word processor application; and (3) obtain an indexed database of the sources along with a search/retrieval engine from an electronic publisher, or create a database usable by a search/retrieval engine.
As far as is known, no application has been devised, however, to adequately deal with the internet and yield the results described in the third system and method above. The internet is a vast and burgeoning source of information concerning nearly every subject. But the internet is comprised of files available in SGML and its derivatives including HTML and XML and other hypertext type formats. A hypertext markup language such as HTML is a structured, yet ambiguous language. In this application, reference is generally made to HTML files and documents, which is the most common format. However, it is understood that this includes the SGML format and its other derivatives, including XML and future modifications, implementations, and standards for use in data files, databases and the internet. As far as is known, having a computer automatically and accurately determine the exact location of text within an HTML type formatted document, object, or file is not accomplished in the prior art. Consequently, there is no known practical method or system whereby a user can efficiently and effectively use a computer""s speed to search for and retrieve data from a set of files accessible by the computer and get pinpoint, highlighted display of the designated text. It should be noted that the information desired may be in files, objects, or files that are unknown, and available to the user. In addition to the internet, many enterprises have extensive repositories of information stored in electronic form that may contain information an authorized user may desire and want to locate and access. Even at the lowest level, an individual computer generally contains unknown or forgotten data that the user would find valuable. All of these repositories of information cannot be as efficiently accessed by the current art as is desired.
Using the current art in the third system and method above, users can add electronic bookmarks to enable them to quickly return to any part of any volume of an encyclopedia, referred to in the example above, and they can copy portions for insertion into other documents of their own creation. By use of hypertext links appearing within the database, a user is able to instantly view related data for which he had not searched. The links are generated according to a rationale applied when the database index was prepared. Adding hypertext links usable within a database is generally a more complex process. The links are intended to appear to the user in a color or format distinguishable from other data, and when activated, the computer is directed to display another highlighted portion of the database. By naming the instructions to the computer within links as xe2x80x9cpointersxe2x80x9d and what they link to as xe2x80x9ctargetsxe2x80x9d, the process will be facilitated. A database can theoretically have an unlimited number of identical pointers (even though what the user sees can be different for some or all of them), but any pointer can generally only have one target (a specific area of the database to display), and targets are invisible to the user. Links must be sensitive to the context of the document and context sensitivity requires intelligence. Thus, adding links to a database requires human intervention because current computers inherently lack any intelligence. Although simple linking based upon discernible patterns within text and targeted toward files matching those patterns can easily be done programmatically, human intervention is still required to design and initiate the process. Further, such favorable linking circumstances rarely exist within typical, disparate data and even greater human intervention is required. Consequently, search/retrieval engine vendors essentially leave linking up to the creator of the search engine software or electronic publisher to do manually, and the links are generally not customizable by the user. Thus, the vendors commonly provide technical specifications on how to craft pointer and target codes for the software and how to write programs to link their unique databases. However, some word processing and other applications permit users to craft links among compatible files using manual processes.
If a user desires to have the searchable data include context-sensitive links, the choices are generally reduced to: (1) obtaining a pre-linked database from an electronic publisher; or (2) creating a custom database and manually inserting links individually or by use of a custom program written for the unique situation. Beyond the problems of availability and lack of customization, a fundamental problem with the first choice is that a publisher may not consider the same links to be important as a user does. Thus, the publisher may include links that are not important to the user and may not include links that would have been important. A fundamental problem with the second choice is that manually inserting links requires a substantial amount of time and trouble that quickly outweighs any potential benefit to manually inserting links as the quantity of data increases. As far as is known, the current art does not include a system to create links by designating xe2x80x9cpointersxe2x80x9d and xe2x80x9ctargetsxe2x80x9d and having the program automatically create links that are all valid.
It would be highly beneficial to have the results from computer searches of various sources of information that locate information from the various sources, to be quickly and easily saved locally for accessing at a later time, without having to redo the search and re-access the sources of information. This saves search time and repeating the search, which may not locate the previous information. The locally saved information can also be quickly accessed without having to relocate the information. An object of the invention is to allow someone to create his or her own custom, organized database that can be utilized effectively. Each time relevant information and files are located, they can be put into a database, indexed and made available for use.
The limitations of prior systems are overcome by the present invention, which is an improved method and system for acquiring, creating, manipulating, indexing, and perusing data, and for locating and retrieving known or unknown data for the same purposes. In a preferred embodiment, the system is a stand-alone application residing on a user""s personal computer that enables the user to create fully searchable databases or local sources of any size from any electronic documents accessible by the computer and selected by the user. It also enables the user to accurately and methodically locate undiscovered documents that may be of interest. By use of a word processing means integrated into the application, it enables the user to create and include new documents into the database or to create retrievable documents within the application. Any databases or documents that the user creates can be password protected to restrict access by unauthorized users who may have access to the computer.
The invention provides a user with the ability to train a search engine to automatically and methodically search the internet or other data sources according to derived or evolved limitation criteria. Each set of such criteria is stored for reuse or modification as the user desires. Without limiting the criteria, the system could be directed to retrieve and completely index every file that existed on its available data sources. While that would guarantee that all data in those files would be searched for data that the user wants, there are practical limitations.
If the data source is vast, like the internet, the system would attempt to index all of its files, objects, or documents, but it would quickly encounter storage limitations on the user""s computer if default limitations were not automatically imposed. By artfully estimating the time and storage requirements and matching them to available resources, the system guides the user to impose limitations to produce the desired results. This method allows users to completely index all of some data sources, to filter and sort smaller percentages of greater data sources, or to survey large data sources such as the internet. In the latter case, the user can refine the resultant survey to identify smaller, but more relevant, parts of the data sources. After sufficiently iterating the refinement process, the user will be able to index and search all selected and relevant data. Thus, this system and method enable a user to predictably and efficiently solve the problem of selecting and comprehensively searching relevant data from sources with unknown content by combining human intelligence with the indexing and search/retrieval capabilities of a computer. Since the system can be trained to repeat all or parts of previous actions, the user""s instructions can be perfectly carried out while repeatedly using different search criteria.
Uses of the system include those identified herein as well as many others. For example, a vendor could prepare a database, kept on a remote server that contains continually updated information, to be accessed by a computer running this system. Among other things, the database could contain information authorizing the user to continue to use the system and query the database. Independent of the server, the user could then employ all or part of the system""s capabilities for other purposes as desired.
In one embodiment, commercial electronic database publishers could use a system according to the present invention as a publishing system to create databases with more or less homogeneous content. For example, one publisher may produce a monthly searchable, linked database containing issued United States patents, another might produce a linked database containing decisions of appellate courts, and another might produce a linked database containing documents required to be filed by various regulatory agencies, etc. Using prior systems to produce such databases requires substantial programming skills to incorporate reference links within the database, but in practice, many such links are invalid because a referenced document does not exist. Using the system according to the present invention does not require such skills because it automatically creates only valid and verified links. The graphical user interface is easily modified to comport with a particular xe2x80x9clook and feelxe2x80x9d desired by the publisher.
In another embodiment, a data provider could maintain a continually updated database of information (e.g., statistical or a glossary) on a remote server that the user accesses via a network such as the internet. Upon being started by the user, an application automatically connects to the remote database when information from the database is needed and disconnects once it is obtained. If the remote database has changed, the user will be notified and the user""s database index can be regenerated to accommodate the changes. By storing user authorization codes on the remote server in a database or table for that purpose, the provider can verify that the user is still entitled to access the service provided. The application on the user""s computer can automatically be rendered dysfunctional by the passage of time unless it successfully renews its operating status by connecting to the provider""s authorization code database. This embodiment provides advantages to both the data provider and the network service provider: (1) the system application can essentially be provided on a subscription or rental basis without the necessity of distribution media or elaborate license or copyright protection schemes; and (2) the network service provider""s effective bandwidth is greatly increased because the system only connects to the remote server on an as-needed, when-needed basis instead of requiring an active modem connection continuously.
Another object of the invention is to provide a method and system for storing search results from various sources including the internet with internet format files, objects, or documents. The locally stored results can be automatically indexed for fast searching and hyper linked by the user to make subsequent finding of the previously located information quick and simple
The system and method of the invention overcomes the above-noted problems of the prior art and can be used for general purpose data acquisition, creation, manipulation, indexing, and perusal while connecting to remote data sources only as needed.
A data acquisition and perusal system and method according to the present invention includes a database selection module, a link module, a database index generator module and a search module. The database selection module enables selection of a plurality of files, objects, or documents for inclusion into at least one selectable database. The link module enables custom links to be defined between selected terms of selected files of the selectable database. The database index generator module enables generation of a searchable index of the data contained in the selectable database including the custom links so that the searchable index includes only valid links. The search module enables a search to be performed of the searchable index according to a search criterion.
The plurality of different files may include a plurality of different file types, such as internet formatted files, objects, or documents, including HTML type formats, and word processor formats, text formats, RTF formats, etc. Generally, each database includes one or more files of a particular type. The database selection module may be configured to enable selection of the plurality of files both locally and remotely via a network. For example, the data acquisition and perusal system and method may be implemented on a computer coupled to a network, where the network may further be connected to the internet. The data acquisition and perusal system and method may be configured to copy internet files to a local storage disk, or to simply maintain a link to the internet files of interest.
The link module enables association of any selected link term with any of the plurality of files in the selectable database. The link module may further enable at least one alias term to be defined for any selected link term to enable a link to be established between each alias term and any of the files in the database. Each of the files may further include one or more fields. The link module further enables field links to be defined between any two or more of the plurality of files. Such field links may be defined according to patterns, where the patterns may further be defined using wildcard characters that each replace one or more digits or characters.
The search module may further enable sorting of any files of the selectable database that meet the search criterion. In one embodiment, such sorting may be according to the respective fields of the files. For example, the files may be sorted by date, by name, or by any other field types or descriptions.
The data acquisition and perusal system and method may further include at least one input device and a display utility including a graphic user interface (GUI). The input device and display utility enables graphic interaction with the database selection, the link, and the search modules via the input device. The display utility displays at least portions of files in the selectable database that meet the search criterion. The portion of a displayed file typically includes any text that meets the search criterion. Such text is usually graphically indicated, such as via color, style, highlighting, etc. Also, any selected link terms defined via the link module are also indicated in a similar manner. Further, the display utility enables interaction with any indicated selected link terms via the input device to enable perusal of linked files in the selectable database. For example, a user may double click on highlighted text indicating a link term in a displayed file, where the data acquisition and perusal system and method jumps to and displays the linked file. Operation is similar for alias link terms if defined.
The system and method may automatically, unambiguously, and accurately place reference links among documents within a database it creates according to a schema controlled by the user. These links enable the user to instantly view a file, object, or document referenced by another file, object, or document currently being viewed and to backtrack to any point of origin in the database. The system and method does not modify or make extraneous copies of the contents of the original database files, objects, or documents. If a file, object, or document is modified or deleted, the integrity of the database is not affected with respect to the other files, objects, or documents because either the database (i.e., the index) will be regenerated, or an error message will be presented telling the user that the file, object, or document has been modified or deleted. The application also may give the user the option to create compressed, password-protected databases for secure dissemination to other users or simply to secure the files, objects, or documents and database indexes for personal use.
Embodiments of a system and method, in accordance with the principles of the present invention, provide methods and systems for acquiring, creating, manipulating, indexing, and perusing data; for locating and retrieving known or unknown data for the same purposes; for automatically connecting to remote network computers on an as-needed, when-needed basis; for validating a user""s rights to use the system; and for securing pertinent data from unauthorized use.