1. Field of the Invention
The present invention relates to a document display apparatus for appropriately arranging and displaying a plurality of documents written in different formats, and more specifically to an apparatus and method for appropriately arranging and displaying the distribution of occurrences of documents in the format of a plurality of time elements such as year/month/day of week/day, and also to a program storage medium for realizing the document display apparatus.
With a remarkable development of personal computers and network communications, electronic documents transmitted among users such as text files, electronic mail, news data (for example, network news data) distributed through a network have strikingly increased both in variation and volume. Under the situation, there is a strong demand for a technology of arranging and displaying these documents based on the contents of the documents.
Most of these documents contain information about the date and time (date information) relating to the document including the information date for a lecture meeting, the deadline of scripts, etc.
2. Description of the Related Art
The present invention generally relates to a data visualizing technology. Conventionally, data have been visualized on formatted information such as a database, etc. Formatted information can be easily processed to visualize desired data by retrieving a specified field and combining it with existing graphing software, etc.
However, documents for text files, electronic mail, network news, etc. are not always stored in a specified format. The operating system allows information such as a file name, a file size, a generation date, an author, etc. to be added as file attributes to the document. However, this is not enough to indicate the contents of a document. For example, if there is a document informing us of a lecture meeting, the date of the lecture meeting cannot be obtained until the document is actually read.
Furthermore, with an explosive increase of free-format document information transmitted through the Internet, etc., a technology of searching the entire document using a character string such as a search engine of the Internet is demanded. However, a desired document cannot always be obtained by detecting a specified character string in the document. That is, a retrieval result may contain noises, i.e. unwanted information.
Additionally, since there are a large volume of documents to be searched and similar words are retrieved after being OR-processed to reduce mis-retrieval, the amount of searched documents is very large. Thus, the technology of visualizing data is earnestly demanded in retrieving effective information from a large volume of noisy retrieval results.
In the circumstances, there have been no method or apparatus for visualizing information in a comprehensible manner by displaying, based on the contents of various documents, when an event frequently occurs and how the occurrence of the event changes depending on the year/month/day of the week/day.
As described above, there has been a technology of retrieving a specified event from various documents, but has never been an apparatus for visualizing frequency information about documents. It is considered that statistically effective information can be obtained by observing a specified event in a large volume of document information if data is output as to, for example, in what month food poisoning frequently occurs, or on what days of the week a larger number of traffic accidents take place.
Based on the above described problems and information, the present invention has been developed to solve the following problems.
1. In the prior art technology, a free-format document can only be recognized by a file attribute (a file name, a file size, and an update date), and information should be contained in a document or added to a document in a specific format in order to recognize the document by its contents.
2. The date information relating to a free-format document can only be obtained by actually reading the contents of the document.
3. The distribution of the date information about a document cannot easily be obtained at predetermined time intervals.