1.1. Field of the Invention
The present invention relates to electronic information processing. In particular, it relates to a method and system for processing a document, which comprises text information, comprising monitoring the occurrence of incomplete time-related citations, in particular the citation of a date, within the text information, and completing said incomplete citation.
1.2. Description and Disadvantages of Prior Art
Today many kinds of information are digitized and stored in electronic archives, as e.g., in a database. A large portion of such information comprises text, i.e. documents in one or more languages containing words and dates.
The usability of such electronic archives, however, is dependent of the fact in which way those documents are indexed, as the index serves often for locating a document. Often users need to find documents that are relevant for a particular date or date range.
A problem is, however, that it is not clear, which date to use for the search, e.g. the date, at which the document was archived electronically, or the date, when the content of the document was generated by its author, or when it was published, etc.
Content-related dates are often incomplete, e.g., “25 of march”, or “in February this year”. “This year” is a vague time-related citation, the reason why this date cannot be used in prior art for indexing purposes or other purposes, in which it is important to know the precise year of the date.
A prior art text search method is known from many word processor program applications. It comprises “full text search”. In this case, the whole content of a document is parsed in order to find a useful date indication, which could meet the search pattern condition. The disadvantage is that an incomplete date occurring in the text of a document will not be in the hit list, because the existing technology requires a 1:1 match in the letters and symbols. For example a query “25/3” requires the document to contain the exact sequence of the letters “2”, “5”, “/”, “3”, etc.
In prior art it is known to transform many different representations of time information into a unified format, in order to make them comparable by the computer. One example for such a canonical time indication can be a language independent date format like “DD.MM.YYYY” as defined in the ISO 8601:2000 standard.
Also in this prior art approach there is no mechanism to handle or complete incomplete date information: If a date indication relating to or comprised of the text of a document is incomplete, because the year indication is missing, for example, prior art completion methods are limited to an obvious completion, e.g., with the current year indication “YYYY”.