At the dawn of civilization, cave dwellers may have sat within the dark confines of their cave and scrawled pictorial figures on the walls to record information by the flickering light of their fires. The amount of information recorded by these early historians was probably limited by the amount of space on the stone walls of their cave. With such a limited amount of recorded information, cave dwellers typically searched for information by visually scanning the walls.
Later, Egyptians used papyrus as a writing medium on which to record information. The known universe of information was larger and more information was deemed worthy of recording. As a result of the increased amount of recorded information, searching for specific information began to become more laborious and difficult.
During the Dark Ages, religious orders such as monasteries served as learning centers. These monasteries also served as the archives for much of the recorded information during the time period. Many monks spent their lives recording information into books. For very important information, such as religious texts, some monks hand copied the information and intricately illustrated the information. Again, searching for information became more difficult as the amount of recorded information grew.
As a result of the industrial revolution, what was considered to be the limits of the known universe of information exploded. The amount of recorded information grew at an astounding pace through the space age with the boundaries of human thought and existence being pushed out farther and farther.
As computers were introduced into our society, more and more information has been recorded and made relatively accessible. The global Internet provides an almost unthinkable amount of recorded information. This amount seems to exponentially increase each day. For example, the World Wide Web (the Web) is a portion of the global Internet having hypertext-enabled pieces of information. A few years ago, the Web contained mainly information that was focused in some niche areas, such as science, UNIX, and UFOs. Today, information on the Web comes close to covering all major subject areas and includes information in multimedia formats, such as video and audio information, in addition to a traditional text format. Despite the vast amount of recorded information online, the information on the Web continues to grow at approximately twenty percent per month, according to some commentators.
With such information accessible via a computer and a modem, many people use the online resources of the Internet and the Web as sources of information. However, searching this vast amount of information can be problematic and vary time consuming. The proverbial phrase of finding a needle in a haystack gains new meaning in today's digital culture of endless numbers of Web sites and freely accessible data warehouses. Therefore, there is a need to find ways to efficiently search for and access the right information in a timely manner if we want to avoid frustration and information overload.
There are many ways of searching for information utilized by existing information retrieval systems. These searching methods usually depend on how the information is classified. Information within a database can be classified into hierarchical categories. This organizes the information in a vertical fashion, beginning at very high-level headings working down into lower-level headings. This is traditionally how most people have been trained to organize information. Some search engines on the Web, such as the YAHOO| searching tool found at the Internet address or universal resource location (URL) of http://www.yahoo.com, use this type of hierarchical categorization methodology to organize online information.
In an online example, new online information (in the form of a URL of a new Web site) is regularly added to the Web. Once Web search engines, such as the YAHOO| searching tool, are informed of the new online information, a human being usually classifies or organizes new information. The new information is classified by deciding the appropriate hierarchical heading with which to associate the new information. Unfortunately, this makes categorization of the new information subjective depending on who is actually doing the categorization. Subjective classification of information may lead to misclassified information. For example, a Web site author may believe their site should be classified under a popular hierarchical heading. The human being making the decision may believe the Web site is more appropriately classified under a less popular hierarchical heading. This may be confusing and frustrating to a user searching for information in this new Web site under the popular hierarchical heading.
Information can also be classified in a non-hierarchical or horizontal fashion for searching. Searching using horizontal classifications is similar to searching bottoms-up through information within the database looking for selected terms, also called keywords. One search engine tool that looks for selected terms is the ALTAVISTA search engine tool created by the Digital Equipment Corporation. The ALTAVISTA search engine tool can be found online at the URL of http://www.altavista.digital.com. The ALTAVISTA search engine tool employs a bottoms-up technique where a term is selected and the term is associated with various documents using an inverted index as a lookup table. The inverted index is essentially a table of documents and terms related to the documents. In this manner, horizontal classification supports content-based searching for documents based upon the term. However, horizontal classification does not usually lend itself to searching based upon the context of the term in a document.
Given the existing kinds of information retrieval systems and the vast amount of recorded information that is usually searched, there can be many problems encountered when trying to provide efficient information retrieval. One type of information retrieval system interacts with an inquiring party using scripted questions to efficiently retrieve information. For example, the inquiring party wants to access certain information within a database via the information retrieval system. In order to handle the inquiry for the desired information, the system usually provides large scripts of questions that are written to guide the inquiring party through an interactive process of finding the desired information from within the database. These scripts are usually static because the script is predetermined in what questions are asked and in what order they are asked. Because of the static nature of the script, the system prompts the inquiring party with each question in the static script according to a predetermined sequence.
Typically, a static script is created and maintained for each domain or grouping of information within the database. For example, if the database contains classified advertising information, the domains may include restaurants or automobiles. In other words, a domain is a highlevel category of the information in the database. A domain may have a corresponding static script which is used by the system in order to find the desired information associated with the domain.
A database usually includes more than one domain or grouping of information. A typical database of information may have over a thousand domains. As the database gets larger with more and more information, the number of domains continues to increase. Accordingly, as the number of domains increases, more static scripts must be created in order to search the database. Creating new static scripts can be time consuming. Additionally, the increasing number of static scripts for the increasing number of domains requires a larger and ever-increasing amount of valuable memory space within the system. As the number of domains increases, the memory requirements may become problematic and the need to create new static scripts for each domain can become burdensome.
Other problems may exist when searching databases with static scripts written for each information domain. For example, the order of questions in a static script may be inappropriate to the inquiry, or one or more questions in a static script may be superfluous. In a given inquiry, the first question in a static script may not be appropriate as the first question because it does not help to focus the search of the database. In such a situation, it is undesirable to ask this first question in the beginning of the static script. To remedy this situation, the order of the questions can be rearranged. However, in order to rearrange the order of questions in the static script, the entire contents of the static script are usually viewed and edited offline to implement the different order of questions. In other words, the whole static script must be laboriously rebuilt offline, which is burdensome and time consuming.
What if there is nothing in the database relative to one of the questions in the static script? As previously described, the system prompts the inquiring party with the questions within the static script according to a predetermined sequence. This can be potentially confusing for the inquiring party if there is no information within the database relative to a question because the system is basically asking about non-existent information. For example, one domain within the database may be related to restaurants. The static script related to the restaurant domain is used to prompt the inquiring party. The static script related to the restaurant domain may include a series of standard questions on restaurants in a fixed order. These questions may include questions on what kind of cuisine, what operating hours are kept (such as Monday through Friday or Saturday), and what amenities are desired (such as valet parking or smoking sections). Typically, these questions are asked in the same sequence each time someone requests information about restaurants. However, if there was no information on restaurant amenities, a system using static scripts would still ask the inquiring party about what amenities are desired. Typically, the system would indicate there is no information about restaurant amenities after prompting the inquiring party about what restaurant amenities are desired or return a null set from the database. Yet, merely asking the question without regard to the available information in the database wastes the inquiring party's time, is a drain upon the resources of the information retrieval system, and may return no information at all. Asking questions without regard to the available information in the database can be especially annoying when repeated questions are asked when no information is available.
Another problem arises when information retrieval system processes inquiries in a voice format. A voice formatted inquiry is typically an inquiry received by the system from an inquiring party who uses a conventional telephone or other telephonic device to interact with the system. When the system processes the voice inquiry, the system typically performs some kind of voice recognition on the voice inquiry. Many voice recognition techniques rely on a vocabulary of words or terms that can be recognized. By comparing the voice inquiry to the vocabulary of terms, the system is able to recognize certain words from the voice inquiry as terms from the vocabulary of terms. However, if this vocabulary is large, the voice recognition technique is usually slower and more prone to inaccuracies.
In summary, there is a need for a system for providing a flexible set of questions within a script used when processing requests for information that (1) more efficiently processes an inquiry, (2) requires less memory when compared to static scripts for each domain, (3) can be easily modified without rebuilding the entire script, (4) is dynamically created according to what kind of information is desired and what information is available in the database, and (5) minimizes the voice recognition processing time and inaccuracies.