The present invention is in the field of Internet navigation including various communication and connection technologies and pertains more particularly to methods and apparatus, including software, for obtaining information from semi-structured, WEB-based data sources for presentation to users.
The information network known as the World Wide Web (WWW), which is a subset of the well-known Internet, is arguably the most complete source of publicly accessible information available. Anyone with a suitable Internet appliance such as a personal computer with a standard Internet connection may access (go on-line) and navigate to information pages (termed web pages) stored on Internet-connected servers for the purpose of garnering information and initiating transactions with hosts of such servers and pages.
Many companies offer various subscription services accessible via the Internet. For example, many people now do their banking, stock trading, shopping, and so forth from the comfort of their own homes via Internet access. Typically, a user, through subscription, has access to personalized and secure WEB pages for such functions. By typing in a user name and a password or other personal identification code, a user may obtain information, initiate transactions, buy stock, and accomplish a myriad of other tasks.
One problem that is encountered by an individual who has several or many such subscriptions to Internet-brokered services is that there are invariably many passwords and/or log-in codes to be used. Often a same password or code cannot be used for every service, as the password or code may already be taken by another user. A user may not wish to supply a code unique to the user such as perhaps a social security number because of security issues, including quality of security, that may vary from service to service. Additionally, many users at their own volition may choose different passwords for different sites so as to have increased security, which in fact also increases the number of passwords a user may have.
Another issue that can plague a user who has many passworded subscriptions is the fact that they must bookmark many WEB pages in a computer cache so that they may quickly find and access the various services. For example, in order to reserve and pay for airline travel, a user must connect to the Internet, go to his/her book-marks file and select an airline page. The user then has to enter a user name and password, and follow on-screen instructions once the page is delivered. If the user wishes to purchase tickets from the WEB site, and wishes to transfer funds from an on-line banking service, the user must also look for and select the personal bank or account page to initiate a funds transfer for the tickets. Different user names and passwords may be required to access these other pages, and things get quite complicated.
Although this preceding example is merely exemplary, it is generally known that much work related to finding WEB pages, logging in with passwords, and the like is required to successfully do business on the WEB.
A service known to the inventor and described in disclosure referenced by Ser. No. 09/208,740 listed under the cross-reference to related documents section provides a WEB service that allows a user to store all of his password protected pages in one location such that browsing and garnering information from them is much simplified. A feature of the above service allows a user to program certain tasks into the system such that requested tasks are executed by an agent (software) based on user instruction. The service stores user password and log-in information and uses the information to log-in to the user""s sites, thus enabling the user to navigate without having to manually input log-in or password codes to gain access to the links.
The above-described service uses a server to present a user-personalized application that may be displayed as an interactive home page that contains all of his listed sites (hyperlinks) for easy navigation. The application lists the user""s URLs in the form of hyperlinks such that a user may click on a hyperlink and navigate to the page wherein login, if required, is automatic, and transparent to the user.
The application described above also includes a software agent that may be programmed to perform scheduled tasks for the user including returning specific summaries and updates about user-account pages. A search function is provided and adapted to cooperate with the software agent to search user-entered URLs for specific content if such pages are cached somewhere in their presentable form such as at the portal server, or on the client""s machine.
A further enhancement to the system described above is known to the inventor and described in the disclosure of application Ser. No. 09/323,598 also listed under the cross-reference section. The described enhancement consists of a means for obtaining information from WEB-based sources using a site-navigation script, a field template, and a means for parsing data. The navigation script follows site logic of a target WEB site containing the data for return to a user. Part of the template includes the description and location of the data requested by a user. A parsing engine acts to identify the new data for retrieval for a user. In this way, WEB summaries may be compiled on updated data at user-frequented sites.
There are certain limitations to the method described above in that an adequate description and location of the target data must be provided before the system may navigate to and parse the available data. The above system is designed to work with structured data wherein the target data appears in a same location or xe2x80x9cfieldxe2x80x9d time after time. Structured data is data that resides in a table, form, or other template format designed to contain the data. In some cases however, data is presented in a semi-structured fashion meaning that a desired chunk of data is not logically identifiable to a specific field, column, line, or table location wherein the data appears time after time. Identifying and retrieving information from semi-structured data sources can be extremely complicated.
A good example of semi-structured data would be news headlines followed by summary text. There may be a differing number of headlines presented on a news page on any given day and the summaries under the headlines may take up variant space between headlines causing the headlines to appear in a consistently different location. Moreover, the summaries may be varied in format, style and so on. A news site containing headlines and summaries in list fashion represents a semi-structured site wherein data appears in different locations at different times. While a user may parse the entire page for data that matches a key word or phrase, the data is extracted out of context and may be meaningless to a user without the surrounding text.
What is clearly needed is a method and apparatus that enables a user to request and receive information from semi-structured data sources. Such a system would provide effective summarization of data for user-visited sites wherein data does not follow a predictable structure or is fragmented over a significant portion of a WEB page.
In a preferred embodiment of the present invention a configurable Internet WEB search system is provided, comprising a browser module for navigating to and displaying a WEB page; a block selection and configuration function having input tools for a user to select at least one block portion of a displayed WEB page for data retrieval; a data-type input function for a user to denote data type to be extracted from a selected block portion; and a search implementation function for implementing a search under the search system. The data type entered by the data input function is associated with a WEB page block selected, and upon search implementation the block selected is searched for the data type requested, and data found is retrieved to be provided to the user.
In preferred embodiments block selection is by click and drag techniques as used in blocking text for a word processor, and data types are entered as natural language strings. Multiple blocks may be selected and a data-type associated with each selected block. In some embodiments search implementation may be initiated as each data block is selected and a data-type is associated with the selected data block, and matched data is immediately retrieved and transmitted to the user. In other embodiments matched data is retrieved and accumulated for a user until the user requests transmission of the accumulated data.
In a preferred embodiment of the invention the search system is implemented between a user station and an Internet Portal server, the block selection and configuration function and the data-type input function executing on the user station, and navigation and data retrieval functions are executed by the Portal server. In these cases the user operates through a portal server to access and configure WEB pages, and the block selection and data-type association functions generate a data-type definition (DTD) file associated with the WEB page listing the selected blocks and associated data types for the page. The user in these cases has a home page on the portal server listing URLs visited regularly by the user, and wherein the system saves the DTD files created by the user for the user""s regularly visited pages in a manner that the search system may be initiated by the user for selected pages from the home page, and when initiated, searches the selected pages according to the stored DTD for each page.
In another aspect of the invention a method for searching WEB pages by a user for specific data is provided, comprising steps of (a) navigating to a WEB page by the user via a browser function; (b) selecting a specific block of the WEB page by the user using a block selection and configuration function having input tools for a user to select at least one block portion of a displayed WEB page for data retrieval; (c) inputting a data type to be associated with a selected block using a data-type input function; (d) initiating a search; and (e) retrieving information from the data block according to the data type input.
In preferred embodiments, in step (b), block selection is by click and drag techniques as used in blocking text for a word processor, and data types may be entered as natural language strings. Also, in some embodiments multiple blocks may be selected and a data-type associated with each selected block. In some cases search implementation is initiated as each data block is selected and a data-type is associated with the selected data block, and matched data is immediately retrieved and transmitted to the user, while in other cases matched data is retrieved and accumulated for a user until the user requests transmission of the accumulated data.
In some preferred embodiments of the method the search system is implemented between a user station and an Internet Portal server, the block selection and configuration function and the data-type input function executing on the user station, and navigation and data retrieval functions are executed by the Portal server. In some of these embodiments wherein the user operates through a portal server to access and configure WEB pages, the block selection and data-type association functions generate a data-type definition (DTD) file associated with the WEB page listing the selected blocks and associated data types for the page. Also in many such cases the user has a home page on the portal server listing URLs visited regularly by the user, and wherein the system saves the DTD files created by the user for the user""s regularly visited pages in a manner that the search system may be initiated by the user for selected pages from the home page, and when initiated, searches the selected pages according to the stored DTD for each page.
In embodiments of this invention described in enabling detail below, for the first time a fast and efficient system is provided for enabling a user/subscriber to retrieve data from semi-structured data sources.