1. The Field of the Invention
This invention relates to a data extraction tool and, more particularly, to novel systems and methods for searching, organizing, and presenting information stored in electronic format.
2. The Background Art
In what is known as the information age, information is readily available electronically, through information repositories known as datastores and databases. Datastores are substantially unorganized collections of data, while databases are indexed in some fashion. The Internet, the world's largest database, has made available enormous quantities of information to anyone with a personal computer and Internet access. This can be very helpful for people who wish to learn about something or conduct business in the convenience of their own homes. However, it can also be tremendously time-consuming to locate a desired bundle of information among the millions available.
The Internet is organized only by the name of each web site. Each individual or group maintaining a web site decides how that web site will be organized. Thus, there is no official catalog of information available on the Internet. Anyone desiring information must hypothesize which web sites would be likely to have the desired data and navigate through those web sites according to the organization set up by the web site's operator. Although other databases and datastores are small, many exhibit the same organizational difficulties.
Some companies have developed portals to automate a portion of the search for information. Most of these portals are text-based. Currently available portals include search engines, and directories.
To use a search engine, a user provides a set of words to search for, and the search engine returns a list of “hits” or web sites containing those words. Search engines are advantageous in that they require little user input or understanding of the operation of the search engine. However, they can be difficult to work with for a number of reasons.
For example, the list may contain a vast number of hits, few of which actually relate to the desired piece of data. Conventional keyword searching returns any instance of the word being sought, regardless of the way the word is used in the web site. Although a user may add additional key words to narrow the search, there often is no combination of words that must be found together to exclude all irrelevant pages while keeping all relevant ones.
Also, many conventional search engines return only the home page of a web site that contains the keyword. It is then up to the user to find the keyword in a site and determine whether it is relevant. This requires a user to figure out how the site is organized and follow the right links. This can be difficult because there may be no links that clearly indicate where the keyword is.
The output from most search engines is simply a page of links to possibly relevant sites. A user may wish to supplement or rearrange the search results, but the way the results of a search are formatted typically makes addition or modification of criteria difficult or impossible.
Moreover, information obtained through a search often becomes outdated. Currently, a user must revisit previously found sites to determine whether the old information is still valid. Additionally, a user must perform a new search to locate any newly relevant sites and search through those sites for relevant information.
Directories function differently than search engines. Rather than search based on keywords provided by a user, most directories provide a user with an information scheme, often hierarchically organized. The user then chooses what type of information to search for, designating narrower groups of information with each choice. Ultimately, the user reaches the bottom level of the hierarchy and receives a list of links to information within that level.
Directories are advantageous in that information concerning a certain topic is typically grouped together. A directory probably will not inundate a user with information, but rather provide a few links believed to be important by the creators of the directory. Nevertheless, directories have drawbacks of their own.
For example, traditional directories contain information deemed of value by those who compile them. A user may have an entirely different view of what is important and what is irrelevant. A user may thus find that information he or she needs simply is not available on the directory.
Also, directories take time to navigate. A user must make a series of decisions to reach any useful information at all. Even then, a user may find it necessary to backtrack and choose a different route through the hierarchy. Since a user cannot fashion groupings of information, he or she may be required to view several branches of the hierarchy to obtain the full range of information he or she desires.
Moreover, if a user does not know how to classify the bit of information sought, he or she may not even he able to find it in the directory. For example, a user desiring to find the meaning of “salmonella” in a biological directory may spend great amounts of time looking through the “aquatic life” branch of the directory, without ever realizing that “salmonella” is more properly classified as “microscopic life.” The more a user's view of how information should be organized differs from that of the directory's creators, the more difficult it will be for the user to find information in the directory.
Consequently, there is a need for a data extraction tool capable of providing many of the benefits of both search engines and directories, without drawbacks listed above. For example, there is a need for a tool that could reliably provide a list of highly relevant information locations based on a simple text query. Furthermore, such a tool should provide ready access to the exact location of the information. Preferably, the tool would supply the user with a list of locations or links that can be easily sorted and updated for the convenience of the user. Furthermore, the tool should not require that the user understand the configuration of the tool's internal databases.
In addition to the problems mentioned above, current searching methods are deficient in a number of other ways. Consequently, a more advanced data extraction tool may provide numerous benefits to those desiring to obtain information from a large datastore or database, such as the Internet.