One of the most useful and successful applications for searching of the Internet (whether from a fixed location such as a desk-top computer/workstation or from a mobile device, e.g., from a personal computing assistant or hand held computing device) is for the provision of information to the user that is constrained in certain aspects, i.e., is multidimensionally constrained. This could be, e.g., scheduled-event information that is constrained by both location and time, and also, e.g., by the type of event. People appreciate the power and convenience of the Internet (sometimes referred to as its subset, the World Wide Web or simply the Web) in collecting such types of information, e.g., for the purpose of populating personal event calendars with the extracted event information. The information is thus application specific, i.e., it is used with an application resident on the user's computing device, e.g., the calendar, and it is multidimensionally constrained, e.g., for a specific time and a specific location for a specific event from a selected type of events or multiple types of events, e.g., sporting events and entertainment events and the like.
This is evidenced by the popularity of websites such as digitalcity.com that provide information on cultural events for various cities. The Vindigo.com service, which has over 500,000 users, and has demonstrated that obtaining location-based event information on a PDA in real-time is very popular with mobile users. Yet, for all its power, searching libraries of searchable documents containing relevant information, e.g., web-pages on the Internet for interesting events that fit the user's time and location constraints, can still require too much effort and frustration on the part of the user, especially if the user's interests singularly or collectively do not fit the relatively few categories available on any single web-site or even a relatively few web-sites.
Will “Phantom of the Opera” be playing anywhere in South Dakota this fall, and if so, can the user fit it into the user's schedule? Trying to answer this question today requires a lot of energy and time visiting multiple search engines and following links. It would be much more convenient to be automatically notified of events of interest to the user, regardless of whether or not they are too obscure to be listed on the existing Web calendar sites.
General-purpose search engines on the Web that search based on specific keywords or patterns of links are well known, for example Google.com, AltaVista.com, HotBot.com, etc. They do not, however, have the ability to push events to users based on their interests. Additionally, at present, the web-sites that do exist that are capable of searching and retrieving event information in a few select categories, retrieve information from an event database that is manually compiled and updated using event lists from specific content providers, such as SportsTicker, MovieFone, etc. This severely limits the scope of event information available from these sites. Because of the manual compilation and scaling issues, the categories are necessarily broad and limited to the most popular ones. The power of the Internet lies in its ability to supply very specialized data to large numbers of users economically and tailored to each individual's needs. Existing content-oriented, e.g. event-oriented, Web information services have not shown the ability to exploit the full power of the Internet.
Thus the need exists for a content-oriented, e.g., scheduled-event oriented, Internet service that can automatically mine event information from the Web; organize it along the dimensions of selected constraints of a multidimensional set of application specific constraints, e.g., location, time, and category dimensions; and supply it in customized fashion to each user, e.g., that is useable directly by an application resident on the user's personal computing device, including over the Internet, via, e.g., fixed wire or wireless communication. By automating the collection of the multidimensional information, e.g., the event information, scaling properties will be greatly improved and the category quantization can be much finer, which means a much better match can be made with the user's particular application, e.g., with the user's specific sporting, entertainment, or professional interests and availability according to the user's schedule. Users of both fixed and mobile computing/information devices can, therefore, have a versatile and convenient service for retrieving application specific information, e.g., event information directly from queries made by the user applicable to specific types of information, and, if the user desires, for automatically pushing the application specific information, e.g., event information to the user's calendar. The application specific multidimensional information which matches the user's specific application requirements can be provided automatically and dynamically and utilized by the user's specific application program to automatically and dynamically provide the user with the desired final information, e.g., the placement on the user's electronic calendar of an event of interest to the user and which is not in conflict with the user's existing schedule and/or should be evaluated by the user to select between the newly added event and an already scheduled event. Overloading the user with irrelevant or uninteresting information, e.g., event information and excessive searching under the user's direction of legions of information source locations, e.g., web-pages in web-sites on the Internet, can be eliminated.
At present there are several known methods of the automatic extraction of information from information source locations, e.g., web documents, i.e., web-pages on web-sites. Some of the examples are listed below. Y. Yang, J. G. Carbonell, R. D. Brown, T. Pierce, B. T. Archibald, and X Liu, Learning Approaches for Detecting and Tracking News Events, IEEE Intelligent Systems, pp 32-43, July/August, 1999 (the disclosure of which is hereby incorporated by reference) disclose the extension of some of the popular supervised and unsupervised learning algorithms to allow document classification based on the information content and temporal aspects of, e.g., news events. The disclosed system is capable of detecting relevant events from large volumes of news stories, presenting abstracts of events in a hierarchical fashion, and tracking events of interest based on a user given list of sample stories. This work is an example of topic detection and tracking as discussed in J. Allan et al, Topic Detection and Tracking Pilot Study: Final Report, DARPA Broadcast News Transcription and Understanding Workshop, Morgan Kaufmann, San Francisco, 1998, pp 194-218 (the disclosure of which is hereby incorporated by reference. In G. Barish, C. A. Knoblock, Y. S. Chen, S. Minton, A. Philpot, and C. Shahabi, Theaterloc: ACase Studyin Information Integration, in IJCAI Workshop on Intelligent Information Integration, Stockholm, Sweden, 1999 (the disclosure of which is hereby incorporated by reference), the authors present a technique to efficiently learn extraction rules for obtaining information about movie theatres and restaurants from Web-based entertainment guides. An approach to automatically learn prepositional rules to identify the name of a person given on their home page was disclosed in D. Freitag, Information Extraction from HTML: Application of a General Machine Learning Approach, in Proceedings of the 15th National Conference on Artificial Intelligence, pages 517-523, 1998 (the disclosure of which is hereby incorporated by reference).
Another approach concentrating on extracting relational information between pages on the web is disclosed in S. Slattery and M. Craven, Combining Statistical and Relational Methods for Learning in Hypertext Domains, in Proc. Of the 8th International Conference on Inductive Logic Programming (ILP-98), 1998 (the disclosure of which is hereby incorporated by reference). In this work, the authors disclose the use of relational learning to identify advisor-advisee relations between faculty and graduate students using text and hyperlinks contained in the web pages. In R. Ghani, R. Jones, D. Mladenic, K. Nigam, S. Slattery, Data Mining on Symbolic Knowledge Extracted from the Web, Proceedings of the KDD-2000 Workshop on Text Mining, pages 29-36, Boston, Mass., August, 2000 (the disclosure of which is hereby incorporated by reference), the authors extract information about corporations across the world from resources on the web. Then data mining is performed on the created knowledge base. The authors claim that the results indicate that there is indeed promise in automatically learning new things from the web. In the paper A. McCallum, K. Nigam, J. Renie, and K. Seymore, Building Domain-Specific Search Engines with Machine Learning Techniques, AAAI-99 Spring Symposium on Intelligent Agents in Cyberspace (1999), the authors describe the Ra Project, which uses machine learning methods in an effort to create and automate domain-specific search engines. The paper presents efficient spidering via reinforcement learning, extracting topic relevant sub-strings, and building a topic hierarchy. The techniques of wrapper induction as disclosed in N. Kushmerick, D. Weld, and R. Doorenbos, Wrapper Induction for Information Extraction, In Proc. Of the 15th International Conference on Artificial Intelligence, pp 729-735, 1997 utilize learning algorithms that are capable of extracting prepositional knowledge from highly structured automatically generated web pages.
The art does not disclose the automatic extraction of multidimensional application specific information from a library of information source documents, such as, the automatic extraction of event information from Web documents.
From a commercial perspective, multiple event- and calendar-oriented web-sites and services have been developed in response to the need for event tracking software, but they lack automatic scheduled-event compilation. For example, an event Web site called when.com was recently purchased by America Online to provide personalized event directories and calendar services for users. However, when.com's approach suffers from the manual compilation limitations discussed above. Other search engines for monitoring events are also available on the Web, some of which are listed below in Table 1. They also have limitations similar to when.com.
TABLE 1Partial list of websites for obtaining scheduled-event informationWeb SitesMain featuresLimitationswww.when.comDirectory of selectManually createdevent categoriesevent directory(sports, book andNo time and placemovie releases, etc.)query for searchingPersonalized calendarevents.with capability ofadding and trackingspecific eventswww.palm.netTime and place queryManually created(Event Club)search for US andevent directoryselect internationalNo time and placecities.query for searchingevents.www.whatsgoingon.comTime, place and eventManually createdquery search for selectevent directoryevents in US andNo calendar featuresselect internationalcitieswww.event.netDirectory of selectManually createdevent categoriesevent directoryMainly for organizingNo time and placeand planning eventsbased query search.(such as parties,movie, etc.)www.expoworld.netMeta-site and searchManually createdengine linking eventdirectory and linksrelated Search ToolsOnly for trade showsMainly for events andMore suitable forinternational tradeplanning eventscommunitiesworldwide
There have been several notable efforts in eliciting information from, e.g., highly structured web-documents. In Doorenbos, R., Etzioni, O., Weld, D. S., A Scalable Comparison-Shopping Agent for the World Wide Web, in Proc. of the First International Conference on Autonomous Agents, 1997 (the disclosure of which is hereby incorporated by reference), the authors investigate the effectiveness of intelligent information extraction agents via a case study called ShopBot. As reported, ShopBot is a fully implemented, domain-independent comparison-shopping agent. The agent automatically learns how to shop at different E-commerce sites and then garners product information in an effort to assist the user with a survey of the product price across shops. In M. Craven, D. Dipasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, S. Slattery, Learning to Extract Symbolic Knowledge from the World Wide Web, Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98) (the disclosure of which is hereby incorporated by reference), the authors report the development of a trainable information extraction system that takes two inputs: an ontology defining the classes and relations of interest, and a set of training data The training data consists of tagged segments of hypertext that represent instances of the selected classes and relations. Once the system is trained, the system can extract information from other pages on the web. The authors report the use of a modified naïve Bayes approach to classifying web pages into different pre-established classes. In D. Freitag, Information Extraction from HTML: Application of a General Machine Learning Approach, in Proceedings of the 15th National Conference on Artificial Intelligence, pages 517-523, 1998 (the disclosure of which is hereby incorporated by reference), the authors report the use of SRV, a relational learning system that automatically learns to extract rules from a domain consisting of university courses and research pages from the Web. Kushmerick, D. Weld, and R. Doorenbos, Wrapper Induction for Information Extraction, in Proc. of the 15th International Conference on Artificial Intelligence, pp 729-735, 1997 (the disclosure of which is hereby incorporated by reference), discuss wrapper induction methods for information retrieval. In their reported approach, they use wrappers to effectively extract information from web-pages that are generated based on HTML. The wrapper induction based systems generate delimiter-based rules and do not use linguistic constraints. Other examples of agents capable of automatically extracting information from the Web include WHISK as reported in S. Soderland, Leaning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning, 34, 233-272, 1999, RAPIER, as reported in M. Califf and R. Mooney, Relational Learning of Pattern-Match Rules for Information Extraction, Working Papers of the ACL-97 Workshop in Natural Language Learning, pp 9-15, 1997], CRYSTAL, as reported in S. Soderland, D. Fisher, J. Aseltine, W. Lehnert, CRYSTAL: Inducing a Conceptual Dictionary, Proc. of the 14th International Joint Conference on Artificial Intelligence, pp 1314-1319, 1995, and Webfoot, as reported in S. Soderland, Learning to Extract Text-Based Information from the World Wide Web, in Proceedings of the Third International Conference of Knowledge Discovery and Data Mining, KDD-1997 (the disclosures of each of which is hereby incorporated by reference). In Doorenbos, R., Etzioni, O., Weld, D. S., A Scalable Comparison-Shopping Agent for the World Wide Web, in Proc. of the First International Conference on Autonomous Agents, 1997 (the disclosure of which is hereby incorporated by reference), the authors claim that most of the learning agents that are in vogue seem to concentrate on learning more about the user's interests than trying to learn about the resources they access. The present invention involves understanding the Web documents to elicit event information in the context of user interests which are specified explicitly by the user.
Inductive learning techniques are also well known in the art, such as CN2, discussed in P. Clark, and T. Niblett, The CN2 Induction Algorithm, Machine Learning, 3(4), pp 261-263, 1989; SRV, discussed in D. Freitag, Information Extraction from HTML: Application of a General Machine Learning Approach, in Proceedings of the 15th National Conference on Artificial Intelligence, pages 517-523, 1998; C5, discussed in J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, Los Altos, Calif., 1992; and FOIL, discussed in J. R. Quinlan, and R. M. Cameron-Jones, FOIL: A Midterm Report, in Proc. of the 12th European Conference on Machine Learning, 1993 (the disclosures of which are hereby incorporated by reference).