The present invention relates to an information collecting apparatus for collecting information from WWW (World Wide Web) sites and more particularly to, an information collecting apparatus enabling quick and accurate collection of the information required by a user.
The amount of information contained in the WWW sites is increasing in association with recent rapid developments of the Internet. A browser is used for collecting and browsing this enormous information. When information is to be collected and browsed, a browser is started up in a client terminal and an URL (Uniform Resource Locator) which specifies the information containing site is supplied and the desired information is collected.
However, considering the background such as significant increase in the amount of information on the WWW sites in recent years and the high frequent of updating the information, it is getting difficult day by day to quickly collect the latest information because a browser is required to be repeatedly started up each time the desired information is to be collected.
FIG. 9 is a block diagram showing a general configuration of an information collecting apparatus based on the conventional technology. The information collecting apparatus shown in this figure is connected to a network such as the Internet and collects information from the WWW sites. This information collecting apparatus comprises a display 101 for displaying data thereon, an input device 102 comprising a keyboard or a pointing device such as a mouse, a memory 103 for storing therein scrap data identifying information or the like described later, and a computer 104 providing controls over the display 101, the input device 102, and the memory 103 to execute various processing.
Herein, FIG. 10 shows a block diagram of functions of the information collecting apparatus based on the conventional technology. A shown in this figure, the information collecting apparatus comprises a user interface 201 for specifying a particular area of a WWW document by a user, a scrap data identifying information generating section 202 for generating scrap data identifying information used for identifying the data specified by a user inside a WWW document, a scrap information memory 203 for storing therein a set of a URL of the WWW document specified by the user and scrap data identifying information as scrap information, and a scrap page updating section 207.
The scrap page updating section 207 comprises a WWW document collecting section 205 for collecting a WWW document corresponding to the specified URL from a WWW server 208 via the Internet (not shown), a data extracting section 204 for cutting out a portion of a WWW document collected anew according to the scrap-data identifying information, and an extracted data linking section 206 for linking one extracted data to another to form one document.
In the description below, data that a user specifies on the user interface 201 is called xe2x80x9cscrap dataxe2x80x9d, information for identifying a starting point and an end point of the scrap data inside a WWW document is called xe2x80x9cscrap data identifying informationxe2x80x9d, and a set of a URL of a WWW document with scrap data specified by the user and the scrap data identifying information is called xe2x80x9cscrap informationxe2x80x9d. Herein, as the user interface 201 for specifying the scrap data, any kind of interface may be employed on condition that URL of a WWW document containing a data required by the user and a starting point as well as an end point in the WWW document can be identified. As an example of this user interface 201, a browser having a function of selecting a text on a display may be considered.
When the browser is used as the user interface 201, the user starts up the browser and selects a particular portion in the document as shown in FIG. 13 (the selected portion is shown in FIG. 13 as hatched area but in reality the portion may be displayed in reverse video). The selected portion represents the scrap data required by the user. FIG. 13 is a view showing an example (one of screen displays) of selection of scrap data on the browser.
When the scrap data is pointed using the user interface 201 as described above, the URL of the WWW document currently appearing on the browser is stored in the scrap information memory 203. Further, the browser (the user interface 201) transfers the displayed www document in a form of a HTML (Hyper Text Markup Language) document as well as the data specified by the user as scrap data to the scrap data identifying information generating section 202.
The scrap data identifying information generating section 202 generates the scrap data identifying information for identifying a starting point and an end point of the scrap data in the WWW document from the HTML document and the scrap data, and the scrap information memory 203 stores this information. This scrap data identifying information is used in the data extracting section 204 when information required by the user is collected afterward from a newly collected WWW document. Therefore, the scrap data identifying information satisfies, even after the WWW document at the WWW site (WWW server 208) is changed, a condition that the information is quite possible to remain the changed WWW document.
As an example of the scrap data identifying information satisfying the condition described above, contents of an initial line of scrap data and contents of immediately before or immediately after the starting or end points of the scrap data may be considered. Generally, the user specifies a portion inside a WWW document which have a possibility of being changed as scrap data, but, in many cases, the contents before and after such an area in the WWW document is not changed. Thus, contents of a line immediately before the scrap data, initial line of the scrap data, and a line immediately after the scrap data are important. Therefore, in the conventional type of information collecting apparatus, it is assumed that contents of a line immediately before scrap data, an initial line of the scrap data, and a line immediately after the scrap data are stored in the scrap information memory 203.
FIG. 11 is a view showing an example of scrap information stored in the scrap information memory 203. As show in this figure, information contained a line immediately before the scrap data, in the initial line of the scrap data, and in a line immediately after the scrap data is stored in the scrap information memory 203 in correlation with the URL of the WWW document-specified by the user. More specifically, when the scrap data (the section displayed in reverse video in FIG. 13) is specified by the user in a state in which the HTML document shown in FIG. 12 is displayed using the browser as shown in FIG. 13, the information shown in the third line in FIG. 11 is stored in the scrap information memory 203. Namely, the line immediately before the scrap data is xe2x80x9cToday""s top newsxe2x80x9d (Refer to FIG. 12), the initial line of the scrap data is xe2x80x9c15:00 10/21 Updatexe2x80x9d (Refer to FIG. 12), and the line immediately after the scrap data is  less than HR greater than  (Refer to FIG. 12). It should be noted that  less than HR greater than  is a tag representing a horizontal line.
When the information is stored in the scrap information memory 203 as described above and a request for collecting the latest WWW document is issued by the user, in other words, when the user starts up the browser, the WWW document collecting section 205 collects the latest WWW document corresponding to, for instance, the URL described in the third line of FIG. 11 from the WWW server 208 via the Internet (not shown). When the WWW document is collected, the data extracting section 204 identifies the starting point and the end point of data required by the user from the latest WWW document collected anew according to the scrap information stored in the scrap information memory 203, and extracts the data enclosed within the starting point and the end point. Then, the extracted data linking section 206 links the data extracted in the data extracting section 204 to other data and forms one HTML document. For details of the conventional type of information collecting apparatus, refer to Japanese Patent Laid-Open Publication No. HEI 10-187753.
A conventional type of information collecting apparatus is explained above in which, by specifying a starting point and an end point of the data required by the user in a WWW document, a latest WWW document is collected whenever a browser is started up and the specified data in the latest WWW document is extracted. Thus, in the conventional type of information collecting apparatus, it is possible to collect the data required by a user from a WWW document more easily as compared to a primitive method of collecting the data by starting up the browser and visually checking the data on the browser.
In the conventional type of information collecting apparatus, however, when data is to automatically be collected from a WWW document, it is necessary to start up the browser, which operation requires an operator. Further, it can not always be said that the information can quickly be collected if the time for starting up the browser is taken into account. Especially, when a WWW document is to be collected from a popular WWW site or when a WWW document having a high information value is to be collected, because of the high line-traffic due to centering of access on the corresponding WWW site there is a problem that the required information may not necessarily be collected when the browser is started up.
Therefore, in the condition described above, a desired WWW document can not be collected when and as required, in addition, access to such a WWW site has to be made many times at a specified interval, which requires the browser to be started up whenever the access is to be made, so that labor costs increase. There is a conceivable method of solving the problem by accessing a WWW site during late night hours when there is generally a less line traffic. However, this method requires the browser to be started up during late nigh hours, so that the method is not a practical one in consideration of a load to a user derived from late night work.
In addition, when real time information like stock price information is to be collected from a WWW site, in the conventional type of information collecting apparatus, latest stock price information has to momentarily be collected. The stock price, however, may fluctuate in many cases in association with external factors (e.g., fluctuations in the official rate). However, when the external factors do not change, useful information (for wild ups and downs of a stock price) cannot be obtained even if latest stock price information is momentarily collected, so that only useless information may be collected. Therefore, when the conventional type of information collecting apparatus is applied especially to collection of stock price or similar information, accurate information may not necessarily be collected.
It is an object of the present invention to provide an information collecting apparatus which can quickly and accurately collect information via a network.
In the present invention, as initialization, a flag setting unit collects, by starting up a browser, information from a server terminal via the Internet and sets a flag at a target data cell in the collected information. When this flag setting is completed, a information collecting unit collects the latest information without starting up the browser, and a data cell collecting unit collects a latest data cell corresponding to the data cell for which a flag has been set from the collected information. Thus, with the present invention, information (data cell) can quickly be collected without starting up a browser after a flag is set, namely, without requiring involvement by a user.
In the present invention, when the time in a timer reaches a prespecified value (e.g. time), the information collecting unit collects information from a server terminal without starting up a browser. Thus, with the present invention, time in a timer can be used as a trigger for collecting the latest information.
In the present invention, as initialization, a flag setting unit collects, by starting up a browser, information from a server terminal via the Internet and sets a flag at a target data cell in the collected information. When the flag setting is completed, a trigger information collecting unit starts collecting trigger information, and when the trigger information has satisfied a prespecified condition, a information collecting unit collects the latest information without starting up the browser. Furthermore, a data cell collecting unit collects a latest data cell corresponding to the data cell for which a flag has been set from the collected information. Thus, with the present invention, information (data cell) is collected by using trigger information closely related to fluctuations in information (data cell) to be collected as a trigger, thus a redundant operation of collecting information is not carried out.
In the present invention, in a flag setting unit, a browser-information collecting section collects the information, and then a display section displays screen information thereon. When the information is displayed, the data cell selecting section selects a target data cell, and the flag setting section sets a flag to a data cell in the information.
With the present invention, when a latest data cell is collected by the data cell collecting unit, a reporting unit reports the fact that the latest data cell has been collected to a user. Thus, the user can check the contents of the data cell as soon as he/she notices the report.
In the present invention, when a latest data cell is collected by the data cell collecting unit, in addition to a report to that effect, the reporting unit further reports the contents of the latest data cell to a user. Thus, the user can spontaneously check the fluctuating information obtained from the data cell.
Other objects and features of this invention will become understood from the following description with reference to the accompanying drawings.