The present invention relates to the Internet, and more particularly to an apparatus having an Internet automatic Web browsing function.
As personal computers become popular recently, the Internet has become widely used. The Internet is a gigantic aggregate of interconnected computer networks. Its main functions include electronic mail, network news (electronic bulletin board or electronic conference), file transfer (FTP: file transfer protocol), and World Wide Web (WWW). In particular, WWW, an aggregate of hypertext documents coded in the language called HTML, described later, combines various types of information dispersively existing on the Internet and makes them accessible.
As a prerequisite for understanding the present invention, the following describes the structure and operation of the Internet system for the WWW and the structure of a hypertext document.
As shown in FIG. 24, the distribution of hypertext documents is performed by computers 243 and 244, called WWW servers, on the Internet. A user can use a browser program, called a WWW browser (also called a Web browser), on a client computer (hereafter simply called a client) to access documents on the Internet. Normally, the client computer 241 is connected to the Internet via a service organization called a service provider which offers a dedicated communication line of its own. That is, the client. computer 241 dials up the host computer of the service provider via a public line to make access to the Internet. This makes it possible for the user to get information (text, image, sound, and so on) from around the world while staying at home. This practice is called network surfing because it is like surfing through information waves.
A unit of information that is accessed is a file on the WWW server, called a page. Setting up a link, which will be described later, allows the user to jump from one page to another for sequential browsing. The length of a page is variable and may change according to the page creator.
A particular page (home page) on the WWW is assigned an address called a URL (Uniform Resource Locator) which is a unique address on the Internet.
A URL is composed of a protocol name, server name, and an item path name, as shown below.
http://www.abc.or.jp/def/ghi.html
The protocol name indicates the method by which the computer interprets information. Because the WWW server and the Web browser transfer information by the method called HTTP (Hyper Text Transfer Protocol), the protocol name, the first part of the above URL, is xe2x80x9chttp:xe2x80x9d. There is also a protocol, called ftp, for use in file transfer. The xe2x80x9cwww.abc.or.jpxe2x80x9d represents a server name. The xe2x80x9cwwwxe2x80x9d indicates that the server is a WWW server. The xe2x80x9cabcxe2x80x9d in xe2x80x9cabc.or.jpxe2x80x9d indicates an organization name, xe2x80x9corxe2x80x9d indicates an organization type (in this example, an organization/individual), and xe2x80x9cjpxe2x80x9d indicates a code representing a country (in this example, Japan). The item path name xe2x80x9cdef/ghi.htmlxe2x80x9d after the server name indicates the location of an item on the server. The path name usually indicates the name of a file constituting a page. The xe2x80x9cdefxe2x80x9d in xe2x80x9cdef/ghi.htmlxe2x80x9d is a directory name, xe2x80x9cghixe2x80x9d is a file name, and xe2x80x9chtmlxe2x80x9d is an extension indicating that the file is an html file.
Next, the structure of an HTML file (HTML document) will be described.
As described above, HTML is an abbreviation for Hyper Text Markup Language, and a WWW document is usually written in this language. A document written in this language is called an HTML document, and its file is called an HTML file (or HTML text).
FIG. 20 shows the basic structure of an HTML document. An HTML document, a text file in essence, contains codes, called tags enclosed by symbols xe2x80x9c less than xe2x80x9d and xe2x80x9c greater than xe2x80x9d, in a page. Normally, a specified range is bounded by a pair of the start tag and the end tag. The end tag is distinguished from the start tag by xe2x80x9c/xe2x80x9d. Note that there is a tag which is used alone, such as  less than P greater than  indicating the start of a new paragraph. The tag allows character design information and layout information, as well as link information, to be specified. The browser interprets the tag, displays on the screen the HTML document in a format intended by the creator, or controls link operations.
The detailed description of.HTML is omitted here because it is well known. As shown in FIG. 20(a), the basic structure of an HTML document contains various types of tags in the text document. When the HTML document is interpreted by the browser and displayed on the screen, the tags are not displayed, as in FIG. 20(b), with only the specified control reflected on the display. The function which passes control to another page, associated with a character string in the HTML document, when the user executes an operation (for example, click) on the character string, is called a link. In this specification, such a character string part in the HTML document is also called a link for the sake of convenience. A link 201 in a page of the HTML document xe2x80x9caaa.htmlxe2x80x9d, shown in FIG. 20(a), is described as:
 less than A HREF=xe2x80x9cbbb.htmlxe2x80x9d greater than BBB less than /A greater than 
The tags used for setting up a link are called anchor tags ( less than A . . .  greater than . . .  less than /A greater than ), and the part enclosed by the anchor tags is called an anchor point or a hot point. The xe2x80x9cHREF=xe2x80x9d in  less than A HREF=xe2x80x9cbbb.htmlxe2x80x9d greater than  in the start tag of the anchor tags indicates access information on the link destination (in this example, a file name). On the browser screen, the character string xe2x80x9cBBBxe2x80x9d is highlighted as in the displayed character string 203 shown in FIG. 20(b). This highlight display is realized by displaying the character string in a color different from other character strings or by underlining. This allows the user to identify that, if the user points this character string, he or she will be able to jump to some other page.
A link 202 indicates a link to an in-line image. In this case, the image file named xe2x80x9cggg.gifxe2x80x9d is displayed on the screen as an image 204. When the user points the image 204, the content of the link destination xe2x80x9cbbb.htmlxe2x80x9d is read and displayed. An in-line image is an image embedded in a page of the HTML document page for display.
There are several patterns used in a link for link destination access information.
As shown in FIG. 21, when a link is set up (or created) for another page (or an HTML file) in the same server (host), the file name (sometimes, including a directory) is link destination information. FIG. 21(a) shows a link source HTML file and a link destination HTML file. FIG. 21(b) shows the content displayed on the browser display screens associated with the respective files. In this example, when the user points the anchor point character string xe2x80x9cBBBxe2x80x9d, the HTML file xe2x80x9cbbb.htmlxe2x80x9d which is a link destination representing another page is requested and its content is displayed.
As shown in FIG. 22, a link may be set up to some other location in the same page. In such a case, the item name of the location is used as link destination information. As shown in FIG. 22(a), the link source description  less than A HREF=xe2x80x9c#aaa greater than AAA less than /A greater than xe2x80x9d indicates the position of the link destination and, on the other hand, the link destination description  less than A NAME=xe2x80x9caaaxe2x80x9d greater than AAA less than /A greater than  indicates that the item name xe2x80x9caaaxe2x80x9d is linked with the source. As shown in FIG. 22(b), when the user points the highlighted character string xe2x80x9cAAAxe2x80x9d on the browser screen, display control is passed to the position of the item xe2x80x9cAAAxe2x80x9d which is in a subsequent location within the same page. This is useful in a long page to display a list of items each having a link to the corresponding item at a subsequent location.
FIG. 23 indicates a link to a location in a separate page in the same server. In this case, the combination of the file name of the separate page and an item name in the document is used as the link destination access information. In this example, control jumps to the item xe2x80x9cpppxe2x80x9d in a separate file xe2x80x9cbbb.htmlxe2x80x9d in the same server. FIG. 23(a) shows the link source and link destination HTML files, and FIG. 23(b) shows the corresponding browser screens.
Referring again to FIG. 24, information transfer between a client and the WWW server when accessing WWW will be described briefly.
The user connects the client computer 241 to the Internet and then starts the Web browser. Then, the Web browser on the client computer 241 requests a WWW server 243 of a previously-specified URL (this may be changed by the user) to send the content of the page identified by the URL (REQ1). Upon receiving this request, the WWW server 243 returns the HTML text of the page to the client computer 241 (RES1). When the browser receives the text, it analyzes the content and displays it on the screen of the client computer 241. When this page contains an in-line image (and others such as a sound), the browser requests the server 243 for that information (REQ2). In response to this, the WWW server 243 returns an image file (RES2). Upon receiving this file, the browser displays the image at a location specified in the page. When the user points a link in the page displayed on the screen, for example, when the link points to another page on the same WWW server 243, the browser requests the WWW server 243 to send the HTML text of the page (REQ3). In response, the WWW server 243 returns the text (RES3). In addition, when the link destination of the link in the page specified by the user is on a separate WWw server 244, the browser requests the server 244 to send the page information of the link destination (REQ4). In response, the server 244 returns the corresponding page information (RES4). The browser displays the received information on the screen.
A WWW access is made according to such a procedure. The user is able to type an arbitrary URL from the keyboard, instead of specifying a link, to access the page.
Although personal computers become used at home, only those having knowledge on computers or experience in operation to some extent can connect computers to the Internet and enjoy network surfing. Not all members of a family can enjoy network surfing with ease.
These days, to cope with this situation, a TV set with a built-in Internet connection function or an Internet connection apparatus which can be connected to a TV set externally is made available. These TV sets or apparatuses (collectively called information apparatus in this specification), intended for users with no computer knowledge, usually do not have a unit, such as a keyboard, for entering a user instruction into the information apparatus; instead, they have special remote control devices for operation. The browser screen, its menu display, and so on are also designed for that purpose. However, the television is designed to give information continuously without user interactions, while the Internet browser requires user interactions to keep on operation, i.e., requests the user to watch the screen and to give operation instructions. Therefore, this operation could sometimes be cumbersome to passive users who are familiar with the television.
In view of the foregoing, it is an object of the present invention to provide an information apparatus with an Internet automatic Web browsing function which allows the user to receive information passively, as with a television, while keeping the operation required when browsing Internet Webs to a minimum.
An information apparatus with an automatic Web browsing function according to the present invention, comprises access means for accessing documents on the Internet; storage means for storing data of the accessed documents; and automatic Web tracing means for sequentially and automatically tracing link destinations according to a predetermined rule and parameters based on link information defined in the documents stored in the storage means.
This allows even a user with no computer knowledge or operation experience can automatically browse the Web on the Internet continuously in much the same manner the user watches the television without cumbersome operations. Of course, if the user finds interesting information during automatic Web browsing, he or she may suspend or stop automatic browsing to view the information carefully. Today, some televisions offer multiple-divided television subscreens (for example, divided into left and right subscreens) to display separate channel pictures in the separate areas. One of the areas may be used for an Internet screen.
It should be noted that a link-destination document and a link-source document may belong to separate pages in the same Web server, to the same page, or to separate Web servers.
The predetermined rule may relate to either a depth-first search or a width-first search. The depth-first search is suitable for first sequentially tracing the links associated with interested information. On the other hand, the width-first search is suitable for first grasping all the link destinations in the current page and then viewing the contents of further destinations linked with each link destination.
The parameters may be composed of at least the link depth limit to be used when the link destinations are traced downward in the hierarchy and an interval of time required for a transit from one document to another. The link depth limit, if appropriately set, prevents control from going too far from the start point. The interval of time should preferably be set to a length of time during which the outline of a displayed page (or part of a page) may be identified.
The automatic Web tracing means may comprise a history table in which a page access information history is stored each time control moves from a link source to a link destination and a read-page table in which information indicating, for each link destination of each link in a page, whether the link destination was visited or not, wherein the history table is referenced when control returns from one of the link destinations to the link source during automatic Web browsing and the read-page table is referenced to check for unread links.
The parameters may further include an automatic Web browsing time-out time (limit time). The maximum automatic Web tracing time may be set by specifying the time-out time. The information apparatus may further comprise specifying means for allowing a user to specify the rule and parameters. This enables the user to select the rule and parameters he or she likes.
Automatic Web tracing is usually performed while the user is connected to the Internet (that is, on-line), but it may also be done when the user is not connected to the Internet (off-line). In that case, the information apparatus further comprises non-volatile storage means for storing the documents obtained from the Internet, wherein the automatic Web tracing means perform automatic Web tracing with respect to the documents stored in the non-volatile storage means when the information apparatus is not connected to the Internet.
Because connecting to the Internet via a public line requires telephone charges, storing required documents in the non-volatile storage means in advance and performing automatic Web tracing (in off-line mode) with respect to the stored documents can reduce the cost. That is, off-line browsing eliminates the need for displaying documents on the screen at communication time and does not need to consider user""s browse time, thus reducing the communication time. The information apparatus according to the present invention is not limited to the connection to a public line but the connection to a leased line may also be possible.
In this specification, the public line includes an analog telephone line, a digital line such as an ISDN line, and a CATV line.
It is desirable that the information apparatus further comprise inquiry means for asking a user if a connection is to be made automatically to the Internet when the document at the next link destination is not stored in the non-volatile storage means during off-line automatic Web tracing. This prevents the line from being connected without the user being aware of the connection.
The documents must be stored in the non-volatile storage means for use in off-line automatic Web tracing. To do so, the information apparatus may comprise automatic downloading means for automatically acquiring not only the specified document on the Internet but also linked documents through the on-line automatic Web tracing function into the non-volatile storage means.
The automatic Web tracing function according to the present invention is implemented by software. The computer program comprises the functions of accessing documents on the Internet; and sequentially and automatically tracing link destinations according to a predetermined rule and parameters based on link information defined in the accessed documents.
Therefore, the present invention includes in its scope a recording medium which stores the program therein. This program may function as what we call a plug-in of an existing browser. The recording medium may be a ROM that may be mounted on a board in the apparatus, a floppy disk that is portable and nonvolatile recording medium, an MD (mini disk), a Zip medium, a CD (compact disk) ROM, and a fixed secondary storage unit such as a hard disk.