1. Field of the Invention
The present invention relates to a method and apparatus for processing a Chinese teletext using a computer technology, in order to extract key phrases from the teletext existing in a television set or a computer for assisting a user to capture the essential meaning of the teletext.
2. Related Prior Art
Recently, since the computer network, such as Internet, is very popular and numerous teletexts exist in the network, it is an important issue to consider how to rapidly and precisely retrieve desired data through the network.
An existing text retrieval technology enables a user to retrieve desired data or information based on a key word or a phrase. Since such a text retrieval technology does not require a user to remember the sequential number of data, abstract, or complete information, the user can achieve desired data even if the user merely knows incomplete information or a word or phrase.
Generally, the existing text retrieval technology uses a key word or a phrase defined by a user to search a database for retrieving all data associated with the key word or phrase. The existing Chinese text retrieval technology is based on the logic of a syntax in a Chinese text rather than the logic of a syntax of a foreign language and is a well known word processing technology available in commercial market. Such a text retrieval technology is characterized in that data associated with the key word or phrase inquired by the user can be automatically retrieved. Such a technology is merely effective for retrieving data within a finite database using a key word or phrase entered by the user, but not effective for the situation when the user cannot know whether the key phrases have essential meaning in a desired text. If the key word or phrase entered by the user is improper, the retrieved data will be incorrect or incomplete and thus a new key word or phrase has to be conceived and entered for performing the searching operation again.
Such a phenomenon requiring repeated execution of searching operation is due to the huge size of normal database that make users unable to know the complete content of the database and conceive an effective key word or phrase.
Thus, if the user can really know several key phrases about a teletext, the essential meaning of the teletext can be understood. If a text retrieval system can incorporate the method according to this invention for extracting key phrases in the teletext being currently processed, the user can almost know the essential content of the teletext before really reading the teletext. If the extracted phrases are used to acquire detailed descriptions in the teletext, the important portion of the content in the teletext can be rapidly displayed.
In the prior text retrieval technology, if the user cannot knows what the exact key phrase is, it will consume considerable time to perform the searching operation by using an assumed key phrase. If the user can achieve the actual key phrase regarding a desired text in advance, the user can easily capture essential content of the text.
Therefore, the requisitions for utilizing a prior text retrieval system are as follows:
The data to be retrieved is restricted to some specific fields, in other words, in a closed environment; and
The user can not enter a specific phrase or word for executing the searching operation, unless he/she in advance knows about the desired data to a certain extent before performing a searching operation.
A well known text retrieval system, xe2x80x9cChinese Text Retrieval Systemxe2x80x9d, proposed by ACADEMIA Sinica, for example, is only adapted to survey data regarding the Chinese literature and history field, and requires an associated index phrase for retrieving desired data. Additionally, this Chinese Text Retrieval System can be applied to survey data in the Bible by entering a person""s name or phrases about an allusion, and automatically displays the associated data. However, such a text retrieval method will result in serious time consumption in a network having unlimited data fields. Since it is difficult to retrieve the desired data if the user does not know about the context of a text, normally a presumed phrase is used to survey a title but another possible index phrase will be considered for next search if the desired data cannot be retrieved based on the previous presumed phrase.
In the light of the above problems, this present invention provides a method for processing a teletext. The processing method according to this invention comprises the steps of generating a first reference list in a memory storing a plurality of Chinese characters which rarely form a phrase with an adjacent character in a Chinese text; generating a second reference list in the memory storing a plurality of Chinese characters which are sometimes used as conjunction and sometimes form a phrase with an adjacent character in a Chinese text; inputting a teletext; dividing the inputted teletext into respective character strings by using special symbols and the characters contained in the first reference list as separation reference for dividing a sentence in the teletext; performing calculation on the character strings in a statistic manner to extract the character strings containing two or more Chinese characters as Chinese phrases and storing the Chinese phrases into a Chinese phrase data area; checking the Chinese phrases stored in the Chinese phrase data area by the characters in the second reference list to delete a phrase unsuitable to be a meaningful phrase from the Chinese phrase data area; deriving the density value of each of the Chinese phrases stored in the Chinese phrase area using a statistic formula calculating the frequency of the Chinese phrases, the number of characters and the number of the second reference characters contained in the Chinese phrase, the frequency of the Chinese phrase indicating the times of the Chinese phrase appearing in the teletext; and selecting a plurality of phrases having higher density values as key phrases and outputting the selected key phrases for displaying.
In addition, this present invention provides an apparatus for processing a teletext. The apparatus according to the present invention comprises a memory means for storing a first reference list and a second reference list, the first reference list storing a plurality of Chinese characters which rarely form a phrase with an adjacent character in a Chinese text, the second reference list storing a plurality of Chinese characters which are sometimes used as conjunctions and sometimes form a phrase with an adjacent character in a Chinese text; an inputting means for inputting a teletext; a processor for executing the processing operations including: inputting the teletext from the inputting means, dividing the inputted teletext into respective character strings by using special symbols and the characters contained in the first reference list as separation reference for dividing a sentence in the teletext, perform calculation on the character strings in a statistic manner to extract the character strings containing two or more Chinese characters as Chinese phrases and storing the Chinese phrases into a Chinese phrase data area, checking the Chinese phrases stored in the Chinese phrase data area by the second reference characters to delete a phrase unsuitable to be a meaningful phrase from the Chinese phrase data area, deriving the density value of each of the Chinese phrases stored in the Chinese phrase area by suing a statistic formula calculating the frequency of the Chinese phrases, the number of characters and the number of the second reference characters contained in the Chinese phrase, the frequency of the Chinese phrase indicating the times of the Chinese phrase appearing in the teletext, selecting a plurality of phrases having higher density values as key phrases and outputting the selected key phrases for displaying; and a display means for receiving and displaying the key phrases output from the processor.
The method and apparatus for processing the teletext according to this invention have the following improved effects and advantages:
This invention enables a user to rapidly capture the meaning of a Chinese teletext from a Internet providing infinite information resources and speeds the processing of mass Chinese teletexts.
This invention is particular advantageous for the digital television. It can enable the user to acquire key Chinese phrases from a mass of Chinese information provided by the digital television in a simple manner such that time required for achieving information can be reduced and the user does not have to read the content of the information in a detail unless it is necessary.
In the text retrieval field, this invention can acquire key phrases to provide key phrases required by a text retrieval system.