1. Field of the Invention
One or more embodiments of the invention are related to the field of software processes that provide answers to queries. Specifically, embodiments of the invention relate to methods of computing answer results to user-submitted queries customizable by one or more users without requiring keywords to operate, using a plug-in executor, a user database, a plug-in database, a content recognizer, an answer generator and a controller. Users may customize how answer engines respond to queries, answers may be generated by answer generator plug-ins created by other users, content recognizers may be implemented by content recognizer plug-ins created by other users, and answer generators may utilize content recognizers even if the content recognizers are not created by the same author. Not all content recognizers need to be executed for every user, and plug-in resources execution resources are bounded.
2. Description of the Related Art
Generally, users often access the Internet in search for answers to specific questions, rather than general search results, such as links to documents that contain keywords.
Generally, many users begin the process of finding answers by visiting a general search engine, such as Google or Bing. Google and Bing may be considered hybrid answer engines as they both use content recognition and knowledge databases to aid the search process, while indexing is still their main strength.
Typically, most general search engines only index the surface web. However, the deep web is generally much larger than the surface web. Due the size and difficulty in automatically discovering content in the deep web, little, if any, deep web content is typically returned by general search engines. Therefore, if the answer to a user's question resides only in the deep web, a general search engine would not be able to supply the answer to the user.
Typical solutions that enable deep web content to be available in search engine results suffer from several key disadvantages. For example, U.S. Pat. No. 7,941,419 to Bhatkar et al., entitled “Suggested Content With Attribute Parameterization”, discloses system-provided URL templates to determine the location of context to fetch before fetching and returning the content to a user. U.S. Pat. No. 7,739,258 to Halevy et al., entitled “Facilitating Searches Through Contest Which Is Accessible Through Web-Based Forms”, discloses a system that facilitates crawling through web-based forms to gather information using reverse engineering forms. U.S. Pat. No. 7,693,956 to Moricz et al., entitled “Accessing Deep Web Information Using A Search Engine”, discloses identifying and reconciling associated query parameters of queries of query answer web pages to a set of search criteria. Also, United States Patent Publication 20100057686 to Brenier et al., entitled “Deep Web Search”, discloses a data processing system and method for researching websites according to a user query. The systems of Moricz et al. and Brenier et al., for example, use custom understanding of forms to build an index of deep web content. United States Patent Publication 20130304758, to Gruber, et al., entitled “Crowd Sourcing Information to Fulfill User Requests”, discloses how to utilize user-contributed sources to answer queries.
Such typical solutions as discussed above require that the deep web content be accessible by the search engine. If the content is located on a private network accessible by the user, but not the search engine, a typical search engine will not be able to fetch or index the content. Even if the content is reachable, a search engine would typically fetch the content as a generic user, instead of as the user issuing the search request. In such a case, generally, the indexed content would not match what the user would see had he fetched the same location, and this may lead to omitting the location in the list of search results, even though the location would contain relevant information if fetched directly by the user.
Other typical solutions, such as United States Patent Publication 20050160083 to Robinson, entitled “User-Specific Vertical Search”, discloses techniques for performing user-specific searches, wherein if the external site hosting the content is reachable through the network, but requires authentication for access, the content could be fetched on behalf of the user if the search engine stores and transmits the user's credentials with its requests to external services. However, due to security risks, users generally may not be willing to provide their credentials of other sites to the search engine.
Another disadvantage of typical solutions for searching the deep web is that the set of deep web services that is available cannot be customized by untrusted users, or if it is, customization requires integration into the search engine, testing, and manual approval before a new source of deep web services becomes available. DuckDuckGo Goodies, described below, may act as a source of deep web services, and are an example of a solution that allows customization by users, using Goodies, but Goodies must be approved before becoming available. The requirement for the manual approval generally originates from the fact that executing untrusted code carries serious security implications, with both the execution itself and the content that may be generated.
In addition, known systems for searching the deep web have difficulty indexing practically infinite amount of data that may be in one of a huge variety of formats. Considering a web form that takes as input a UPC-A code, for example, there are typically ten digits in a UPC-A code between a numbering system digit and a check digit, therefore there are about 10^10 or 10 billion UPC-A codes in a given numbering system. Very few of the pages may be pruned from the index, because generally UPC-A codes are looked up more or less randomly. Indexing the entire set of pages returned by the form, such as a single form among the millions on the Internet, would normally require 10 billion entries. To overcome this disadvantage, search engines like Google appear to perform content recognition on search queries, wherein for example, Google recognizes twelve digit codes as UPC-A codes, and information about the associated product is returned at the top of the result page. Alternatively, the set of known UPC-A codes may be indexed, which is much smaller than all the possible UPC-A codes. While these approaches are effective, users generally cannot customize the content recognition algorithm, limiting its usefulness. For example, a product manager who uses an issue tracking web application typically does not have the ability to customize Google to recognize an eight-digit number as an issue number for the manager's issue tracking application. As a result, Google is generally unable return a link to the issue tracking web application in its search results, when an eight-digit number is entered as a search query. Also, typically, each query needs to be checked for several, if not all, known content formats since there is no way for a user to customize his account to only recognize certain formats, in order to decrease his query response time or to disambiguate formats that match the same text. The more format types that may be recognized, the more computing resources are consumed and, past a certain degree of parallelism, generally the longer the query will take to process. Other previously mentioned systems, such as United States Patent Publication 20130304758, to Gruber, that do not allow for user implemented content recognizers, also suffer from the same disadvantages.
Instead of general search engines, users may consult answer engines, which perform computation to answer queries. A popular answer engine most often used for technical queries, for example, is Wolfram Alpha. Wolfram Alpha generally uses a computation engine and a knowledge database to provide answers to queries like “sin(pi/2)*3” or “atomic mass of carbon”. One limitation of this typical approach is that answer queries that involve computation need to be crafted in a single language, such as Mathematica. Furthermore, another limitation of such a typical approach is that the answers are generally provided only by Wolfram Alpha itself; hence users typically cannot direct the website to compute arbitrary functions, include answers from other websites, or inject knowledge into its database. An example of a website that does allow user contributions is Freebase, which exposes a database of human knowledge. Freebase, generally, relies on the community to provide knowledge, but it only contains about 2 billion facts as of 2013, while the number of useful facts is likely to be orders of magnitude larger.
Typically, there are many questions asked by real users that have answers that may be easily computed, but not by Freebase or any other knowledge engine, for example. Furthermore, Freebase generally stores only data, but not algorithms to compute answers. Therefore, typical knowledge engines generally cannot, for example, respond meaningfully to a query such as, “What is the 5 day moving average of the stock price of IBM?”, if the answer has not been pre-computed and stored already.
DuckDuckGo, for example, is an answer engine that typically allows users to upload plug-ins called Goodies that generate answers, either in the form of an inline answer (the content of the answer) or a link to a page that contains the answer. These plug-ins, generally, are written in Perl, a relatively insecure programming language, and must generally be approved by the management of DuckDuckGo before going live. Most likely, a reason for the approval step is required because executing untrusted code is a performance risk and a security risk. Once approved, the content in an inline answer produced by a plug-in, typically, is inserted directly into answer pages. In the DuckDuckGo answer engine, generally, all plug-ins are executed, regardless of the identity of the user. If a malicious plug-in is inadvertently approved, for example, the DuckDuckGo answer engine may be attacked, or an answer page may be injected with script that steals cookies or otherwise compromises any user's security.
Using the DuckDuckGo answer engine, generally, Goodies must individually implement content recognition in order to generate different results for the same type of content. For example, if one Goodie outputs the estimated house price at an address, and another Goodie outputs the property tax information for an address, both Goodies typically need to implement address recognition, and both recognition algorithms need to execute. Due to DuckDuckGo's lack of user accounts, generally, users cannot select which Goodies to run, as all Goodies are always running. The number of recognition routines that need to be executed typically increases linearly with the number of Goodies, which may not scale indefinitely.
In “Ranking Suspected Answers to Natural Language Questions Using Predictive Annotation”, by D. Radev, J. Prager, and V Samn, an answer service, called GuruQA, is developed to use indexed documents to answer questions. GuruQA, uses a pre-defined set of content recognizers to mark answer query terms as belonging to a few categories. Since documents need to be reachable to be indexed, GuruQA suffers from the inability to access the deep web. Even if the entire web could be indexed, for example, GuruQA cannot perform computations like “square root of 5.33” since it relies on the answer being in a document. As stated by the authors, GuruQA also generally does not handle “how” and “why” questions very well, as it needs to use artificial intelligence to understand the documents it indexes. Finally, it is not possible in GuruQA for users to define content recognizers or strategies to answer questions.
Other typical search engines, such as vertical search engines, may also provide answers. Given that a segment is known, a vertical search engine is likely to return more relevant answers than general search engines due to its use of domain specific knowledge and focused crawling of sites known to be relevant to the segment. However, vertical search engines are usually not appropriate to use as starting points to determine any kind of answer. Generally, the vertical search engine needs to be selected a priori by the user. Furthermore, most vertical search engines use indexing web crawlers, wherein like general search engines, they are also unable to return many results from the deep web. United States Patent Publication 20050160083 to Robinson appears to show how a vertical search engine could retrieve personalized results from the deep web. However, such a system generally requires that: (1) the search engine has access to the content, which could be in a private network, but still be available to the user; and (2) the user is willing to share with the search engine his credentials to access other external services, which may be a security risk.
Meta-search engines, typically, may increase the breadth of general search engines, save the user from searching multiple sites, and allow the possibility of user customization of the choice and implementation of the search engines that are consulted. Typically, meta-search engines combine results from other general search engines. For example, DogPile combines results from Google, Yahoo!, and Yandex. Meta-search engines like DogPile generally suffer from the same disadvantage as general search engines, which is the difficulty of returning results from the deep web. Typical solutions such as European Patent 1158421 to Bae et al., and U.S. Pat. No. 7,487,144 to Shakib et al., allow for greater customization, however generally do not simplify the manner of returning results from the deep web. The system of Bae et al., for example, discloses wherein each search engine is typically only activated when triggered by keywords, therefore the user has no other control over the activation or priority of the vertical search engines. But, generally, a search query might not have any keywords, like a number.
If a user submits search queries by selecting text, the selected text is unlikely to contain the keywords needed to activate the vertical search engine. Also, a user may want to ensure a particular search engine's results are favored over those of another, such that that it takes less time for the user to find the first search engine's results in the list of search results. In the system of Shakib et al., for example, a user may select a vertical search engine in his preferences, which positively influences the possibility of the vertical search engine being used to service the user's query. However, the logic for selecting a vertical search engine and ordering its results is made by a “vertical determinator”, which is generally not customizable by the user. As such, queries without keywords associated with a given vertical search engine, usually, are unlikely to activate the vertical search engine. U.S. Pat. No. 7,373,351 to Wu et al., entitled “Generic Search Engine Framework”, discloses a knowledge database used to determine which search engines are selected. However the system of Wu et al. suffers from the same disadvantages discussed above.
Typical answer engines, knowledge engines, vertical search engines, and meta-search engines are usually not optimal for finding answers. Many users looking for answers, typically, use a general search engine to find the right website(s) to look for answers, then enter their query in the discovered website(s). For example, a user looking up the English definition of a word in Chinese might search for “Chinese English dictionary” on google.com, which would return a list of sites, such as http://www.zhongwen.com. Typically, the user would then select the link to http://www.zhongwen.com, and finally enter the word to look up. If the user visits ZhongWen.com frequently, the user may generally bookmark the site, or remember the site address, but the next time the user wants to look up a word, the user would typically have to either find the bookmark, or recall the site address and enter it in a browser. If there are multiple websites that the user wants to use to look up the word, for example, the user generally must find each site and enter the word on every website. This process quickly becomes tedious, especially when looking up several words.
Some existing solutions for removing the need to enter queries on multiple web sites rely on the assumptions that (1) the best answer pages available on the web are most likely to be the result of submitting forms on websites built for a specialized purpose, and (2) submitting such a form frequently results in a HTTP GET request of a URL that contains the search query, possibly after a transformation like URL encoding. As such, answer URLs are typically generated by taking a URL template and substituting parts of the search query into the URL template. URL templates are generally used in products like Search Center and Selection Search; both Google Chrome extensions. The output URLs are then typically either loaded into a browser directly or compiled in a result page. While using such typical solutions, typically, (1) the URLs are built on the client side, therefore the set of sites that may be searched may only be upgraded by the user updating the browser plug-in or by defining the URL template directly; (2) knowledge on the effectiveness of URL templates for all users is not accumulated, therefore the user may not know which URL templates to select; (3) the user must specify which sites or categories to use before submitting a query, otherwise many irrelevant answers will be returned; (4) a custom browser or browser plug-in needs to be downloaded and installed; and (5) inline answers are not supported, such as the answer to “second prime number” being “3”.
United States Patent Publication 20020069194 to Robbins, entitled “Client Based Online Content Meta Search”, appears to disclose how to download URL templates from a server, and execute them on the client. Once the URL is formed and downloaded, the client can parse the contents to extract the relevant information, and present it in the search results. Since the URL templates are stored on the server side, knowledge of the availability and effectiveness of URL templates from multiple users can be accumulated on the server. However, the system of Robbins appears to require that the user specify which sites or categories to use before submitting a query, or else too many sites will be accessed and too many results will be returned, and a custom browser or browser plug-in needs to be downloaded and installed. Also, extracting and presenting information found by downloading URLs may be computationally expensive, use network resources unnecessarily, or violate the hosting site's terms of service.
United States Patent Publication 20120136887 to Cha et al., entitled “Method and System for Providing Multifunctional Search Window Service Based On User-Generated Rules”, appears to disclose utilizing user created rules for creating URL templates, and allowing users to customize the search results by selecting and prioritizing such rules. However, the system of Cha et al. appears to disclose wherein the rules are triggered by keywords, and if keywords are not present, the default rules, which may be too general for some search queries, may output too many irrelevant answers.
U.S. Pat. No. 7,941,419, to Bhatkar et al., appears to utilize URL templates in a server-side solution, but fetches the content from the URL before returning it to the user. As such, the system of Bhatkar et al., typically, is unable to return results that are not available on the server-side, even if they are available on the client-side. Also, the system of Bhatkar et al., generally relies on “triggering words” to be present in a query to determine which URL templates to use.
Search result pages returned by general search engines typically contain a list of results, each having a link to another site. The user typically needs to manually click each link to view the search result, which either: (1) opens the link in the same tab or (2) opens the link in another browser tab. In the case of option (1), the user typically needs to go back to the search results page to view the other results. In the case of option (2), the new browser tab must typically be closed manually even if the user initiates another search.
Another type of user interface for viewing search results allows users to switch tabs within the same page, with each answer page in an iframe activated by a tab. An example of such a typical search engine is discussed in U.S. Pat. No. 6,912,532 to Andersen, entitled “Multi-Level Multiplexer System for Networked Browser”. In using the system of Andersen, with current websites, each iframe may not be displayed at all because many websites now send an X-Frame-Options header in their HTTP responses which prevents their content from being displayed in iframes in order to prevent click-jacking attacks.
If the user is using an effective answer engine, typically, the results are more likely to be what the user is seeking, as such alternative user interfaces for presenting search results may be more appropriate. Generally, only presenting links on a single page may be the result of search engines typically earning revenue when such links are manually clicked. Client-side solutions such as Search Center, Selection Search, and as discussed in United States Patent Publication 20020069194 to Robbins, generally have the advantage that with a custom browser or browser plug-in installed, the process of searching for answers may be streamlined. With tools like Selection Search, search queries may typically be selected on existing web pages instead of entered manually into an input box. Search Center, Selection Search, and other similar plug-ins are typically purely client-side solutions; therefore users do not get the benefit of search plug-ins created by other users.
Another typical solution for the search user interface is described in U.S. Pat. No. 7,747,626 to Grimm et al., entitled “Search Results Clustering in Tabbed Browsers”, wherein search results are opened in another browser window's tabs, therefore the user may see answer pages instantly, without having to scan a result page and click links. However, the system of Grimm et al. typically loads all answer pages immediately after the search engine returns the search result. As such, if there are many search results, the browser may be asked to execute tens or even hundreds of web requests at the same time, and each web page could in turn loads dozens of external images, CSS files, and JavaScript files. The concurrent requests typically slow the loading of the first result page unnecessarily, since the user does not see the other result pages until after he clicks the other browser tabs. Additionally, the concurrent requests may over utilize the capacity of the client, which may be a mobile device connected to the network with a low bandwidth connection. Further, the system of Grimm et al. generally does not include a mechanism of learning from user activity about which results are better than others. Generally, prior art user interfaces for displaying web search results in browser tabs do not handle the possibility of results containing inline answers, as opposed to answer locations.
In summary, the answer engine systems described above lack one or more of the following capabilities that enable answer engine users the capability to customize the operation of an answer engine, enable an answer engine to be implemented by a developer community instead of a monolithic entity, enable an answer generation process to occur without keywords or a pre-defined grammar, enable content providers to leverage content recognition capabilities in other plug-ins, enable the reduction of computation required for content recognition based on the user's preferences, enable new answer generation methods to come online and change the behavior of the answer engine without server restarts, manual approval, or significant security risks, enable returning results from the deep web, including results in private networks and results generated with knowledge of the user's identity, and allow content providers to control when users are directed to their sites. Furthermore, the user interfaces to display answers lack one or more of the following capabilities that enable users to view multiple search results without leaving the currently viewed window by selecting or inputting text, enable users to view multiple search results without loading each search result immediately, streamline the process for users to search for multiple items in a succession, and rank plug-ins and their generated answer results based on user interaction with a browser. For at least the limitations discussed above, there is a need for methods of computing and presenting answer results to user-submitted queries.