1. Field of the Invention
The present invention relates to image data processing techniques relevant to a video editing system for editing video data by attaching various information to video data and a video database system or a video image providing system for managing and retrieving video data, and more particularly, to image data processing techniques for extracting, processing, editing, recording, and displaying telop (caption) information contained in video data so as to enhance the utility of video data at video input, recording, and displaying devices such as TV, VTR, DVD, etc.
2. Description of the Background Art
A technique for detecting a frame that contains characters from a plurality of frames constituting a video image has been studied actively in recent years, and there are many propositions of a method based on an intensity difference between frames. This method is suitable for the purpose of detecting a first telop character displaying frame among successive frames in which the identical characters are displayed.
However, the video image can contain telop characters that are displayed in motion (which will be referred to as rolling telop characters hereafter), as in the video image of a talk show in which a brief introduction of a person on the show is rolled from left to right on a lower portion of a display screen. In such a case, the intensity difference between successive frames hardly changes immediately after the telop character series started to appear on the display screen, so that it has been difficult to detect a frame that contains the rolling telop characters by the conventional method.
In addition, the conventional method is also associated with a problem of over-detection in which a plurality of frames displaying the same telop characters are redundantly detected, which is caused when an intensity of a background portion around characters abruptly changes while the successive frames in which the identical characters are displayed.
On the other hand, a method based on an edge pair feature point as disclosed in Japanese Patent Application No. 9-129075 (1997) only accounts for gradient directions of two neighboring edges and does not account for a change of intensity value between edges so that there has been a problem of erroneously detecting a frame with a large intensity change between edges even when there is no character displayed on that frame.
As for a technique for extracting information contained in video data, a telop character detection method has been conventionally known. The telop character detection method proposed so far detects an appearance of telop characters using a spatial distribution of feature points that appear characteristically at character portions, and extracts a series of telop characters by utilizing the property that many telop characters remain static on a display screen for some period of time.
However, such a conventional telop character detection method cannot deal with rolling telop characters that are displayed in motion, because of its reliance on the property that many telop characters remain static on a display screen for some period of time.
In order to detect rolling telop characters as a series of telop characters, there is a need to estimate a moving distance of the rolling telop characters, and establish correspondences of telop characters that are commonly displayed over consecutive image frames. Moreover, in order to detect a telop character image (an image of characters themselves) from a video image accurately, there is a need to accurately superpose corresponding character image portions that are commonly displayed over consecutive image frames.
However, the rolling telop characters are often associated with slant or extension/contraction so that a sufficient accuracy cannot be obtained by merely superposing corresponding character image portions using a moving distance of the telop characters as a whole. Consequently, there is also a need to carry out corrections of local displacement or distortion in addition to calculating a moving distance of the telop characters. But there has been no established technique for carrying out the calculation of a moving distance of the telop characters and the local correction accurately in a practically feasible processing time.
As for a character region extraction technique that can extract character portions as connected pixel regions stably by a small amount of computations from frame images in which characters are displayed in a plurality of frames constituting color video image or a still color image in which characters are displayed, many studies have been made conventionally, including a character region extraction method proposed in H. Kuwano, S. Kurakake, K. Okada, xe2x80x9cTelop Character Extraction from Video dataxe2x80x9d, Proc. of IEEE International Workshop on Document Image Analysis, pp. 82-88, June 1997. See also A. Shio, xe2x80x9cAn Automatic Thresholding Algorithm Based on an Illumination-Independent Contrast Measurexe2x80x9d, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 632-637, San Diego, Calif., Jun. 4-8, 1989.
This method uses a process of forming connected pixel regions which are adjacent to each other in an image space and which have resembling intensity, saturation and hue, by carrying out the division in one-dimensional color spaces of intensity, saturation, and hue sequentially in this order with respect to the input color image, and then removing those regions which do not satisfy the character region criteria from the formed connected pixel regions.
In this conventional method, the division processing in the intensity space is carried out with respect to the intensity space within a local rectangular region in the image, using a threshold obtained within that rectangular region, so that there is an advantage that the good character region extraction result can be obtained even in the case of having a local intensity variation within the image.
However, in this conventional method, in the case where the input character displaying color image is a video image of the NTSC signal format that is used by the TV broadcasting, there has been a problem that the character region will be extracted without a degraded portion at which characters are degraded.
Usually, the video image of the NTSC signal format has features that the original colors are degraded as the color of each pixel is blurred along each scanning line in the image and the color of the background portion is blurred into the characters at left and right boundaries of between the characters and the background portion in the image. For the horizontal components within the character, the degradation occurs only at left and right edges and the central portion is unaffected, but for the vertical components, the entire character portion can be degraded when the character width is narrow, in which case the intensity is lowered such that the intensity contrast between the horizontal components and the vertical components within the character becomes high (see FIG. 26 and FIG. 27).
For this reason, in the above described conventional method, when the threshold is determined within the rectangular region that contains a connecting portion of the horizontal components and the vertical components of a degraded character portion in the video image of the NTSC signal format, the degraded vertical components will be regarded as background so that an incomplete character region will be extracted (see FIG. 28).
Namely, FIG. 26 shows an exemplary case of degradation that occurs within characters displayed in the video image of the NTSC signal format, where the black background color is blurred into alphabetic characters xe2x80x9cAcoustic Echo Cancellerxe2x80x9d and the corresponding Japanese characters shown above such that the vertical components within the characters are degraded into gray.
FIG. 27 is a diagram illustrating the degradation within the character, where the black background color is blurred into an interior of the white telop character xe2x80x9ctxe2x80x9d on the black background such that the interior color of the character is partially degraded into gray. As for the horizontal components in the character, the degradation occurs only at left and right edges and a central portion is unaffected, as in a region A of FIG. 27. As for the vertical components, the entire portion is degraded because the character width is narrow so that the intensity is lowered, as in a region B of FIG. 27, such that the intensity contrast between the horizontal components and the vertical components within the character becomes large. In such a case, the region B can be regarded as background according to the above described conventional method.
FIG. 28 shows a result of extracting the character region from the color image of FIG. 26 using the above described conventional method, which is the incomplete character region without degraded portion.
Also, usually, the characters such as the telop characters displayed in the image have a feature of having a very high color contrast with respect to the surrounding portion. However, the above described conventional method forms the connected pixel regions by the division processing in the color space alone and does not account for the feature regarding the color distribution within the image space, so that the connected pixel region with a low color contrast with respect to the surrounding portion will be also extracted such that many regions other than the character region will be also extracted.
As for a character pattern recognition technique, one example of the conventional method is described in T. Akiyama, N. Hagita, xe2x80x9cAutomated Entry System for Printed Documentsxe2x80x9d, Pattern Recognition, Vol. 23, No. 11, pp. 1141-1154, 1990. See also Tao Hong, et al., xe2x80x9cVisual Similarity Analysis of Chinese Characters and Its Uses in Japanese OCRxe2x80x9d, Proceedings of the SPIE Symposium, Document Recognition II, SPIE Vol. 2422, pp. 245-253, 1995. In this conventional method, a character pattern that is binarized and its position and size normalized is divided into coarse mesh regions, and a character portion existing in each mesh region is observed from coordinate axes in plural directions. Then, the character pattern is recognized by obtaining the direction contributivity (see Japanese Patent Application Laid Open No. 57-8880 (1982)) of the character lines for the black pixels of the character portion that is traversed by the scanning from each coordinate axis.
This conventional method extracts information from vicinities of black pixels that form a contour portion by observing the character lines, so that there has been a problem that it cannot correctly recognize a character which is often associated with a deformation of the contour portion due to the character line displacement or the image quality degradation.
It is therefore an object of the present invention to provide a scheme for detecting telop character displaying frames in video image which is capable of suppressing erroneous detection of frames without telop characters due to instability of image features and over-detection of frames displaying the same telop characters redundantly.
It is another object of the present invention to provide a scheme for detecting telop characters in video image which is capable of detecting the rolling telop characters as a series of telop characters.
It is another object of the present invention to provide a scheme for extracting character regions in the image which is capable of extracting the degraded portion within the high intensity character and suppressing the extraction of regions with a low contrast with respect to the surrounding portion at a time of the character region extraction from the image.
It is another object of the present invention to provide a scheme for character pattern recognition which is capable of obtaining information regarding two-dimensional structure of a character and correctly recognizing a character associated with the contour portion deformation or the character line displacement by using features that are hardly affected by the contour portion deformation or the character line displacement, for a character pattern that is binarized and its position and size normalized.
According to one aspect of the present invention there is provided a method for processing video data, comprising the steps of: (a) entering each input frame constituting the video data; and (b) judging whether each input frame entered at the step (a) is a telop character displaying frame in which telop characters are displayed or not, according to edge pairs detected from each input frame by detecting each two adjacent edge pixels for which intensity gradient directions are opposite on some scanning line used in judging an intensity gradient direction at each edge pixel and for which an intensity difference between said two adjacent edge pixels is within a prescribed range as one edge pair, edge pixels being pixels at which an intensity value locally changes by at least a prescribed amount with respect to a neighboring pixel among a plurality of pixels constituting each input frame.
According to another aspect of the present invention there is provided a method for recognizing character patterns, comprising the steps of: (aaa) dividing character patterns that are binarized into black pixels and white pixels into divided regions each containing one character; (bbb) normalizing position and size of a character in each divided region; (ccc) dividing a character pattern of each normalized character into mesh regions; (ddd) counting a run-length of white pixels which are adjacent in each direction starting from a white pixel existing in each divided mesh region, for a plurality of prescribed directions; (eee) calculating a direction contributivity of each direction as a value obtained by averaging the run-length in each direction by an accumulated value of all the run-lengths for all the prescribed directions as counted by the step (ddd), for each divided mesh region; (fff) calculating a feature value of each divided mesh region by accumulating the direction contributivity of each direction for all white pixels in each divided mesh region and averaging an accumulated value of the direction contributivity of each direction by a number of white pixels within each mesh region; and (ggg) carrying out a processing for recognizing the character pattern of each normalized character using the feature values obtained for all the mesh regions at the step (fff).
According to another aspect of the present invention there is provided a method for recognizing character patterns, comprising the steps of: (aaa) dividing character patterns that are binarized into black pixels and white pixels into divided regions each containing one character; (bbb) normalizing position and size of a character in each divided region; (ccc) dividing a character pattern of each normalized character into mesh regions; (ddd) counting a run-length of black pixels which are adjacent in each direction starting from a black pixel existing in each divided mesh region, for a plurality of prescribed directions; (eee) calculating a black pixel direction contributivity of each direction as a value obtained by averaging the run-length of black pixels in each direction by an accumulated value of all the run-lengths of black pixels for all the prescribed directions as counted by the step (ddd), for each divided mesh region; (fff) counting a run-length of white pixels which are adjacent in each direction starting from a white pixel existing in each divided mesh region, for a plurality of prescribed directions; (ggg) calculating a white pixel direction contributivity of each direction as a value obtained by averaging the run-length of white pixels in each direction by an accumulated value of all the run-lengths of white pixels for all the prescribed directions as counted by the step (fff), for each divided mesh region; (hhh) calculating a black pixel feature value of each divided mesh region by accumulating the black pixel direction contributivity of each direction for all black pixels in each divided mesh region and averaging an accumulated value of the black pixel direction contributivity of each direction by a number of black pixels within each mesh region; (iii) calculating a white pixel feature value of each divided mesh region by accumulating the white pixel direction contributivity of each direction for all white pixels in each divided mesh region and averaging an accumulated value of the white pixel direction contributivity of each direction by a number of white pixels within each mesh region; and (jjj) carrying out a processing for recognizing the character pattern of each normalized character using the black feature values obtained for all the mesh regions at the step (hhh) and the white feature values obtained for all the mesh regions at the step (iii).
According to another aspect of the present invention there is provided an apparatus for processing video data, comprising: (a) a unit for entering each input frame constituting the video data; and (b) a unit for Judging whether each input frame entered at the unit (a) is a telop character displaying frame in which telop characters are displayed or not, according to edge pairs detected from each input frame by detecting each two adjacent edge pixels for which intensity gradient directions are opposite on some scanning line used in judging an intensity gradient direction at each edge pixel and for which an intensity difference between said two adjacent edge pixels is within a prescribed range as one edge pair, edge pixels being pixels at which an intensity value locally changes by at least a prescribed amount with respect to a neighboring pixel among a plurality of pixels constituting each input frame.
According to another aspect of the present invention there is provided an apparatus for recognizing character patterns, comprising: (aaa) a unit for dividing character patterns that are binarized into black pixels and white pixels into divided regions each containing one character; (bbb) a unit for normalizing position and size of a character in each divided region; (ccc) a unit for dividing a character pattern of each normalized character into mesh regions; (ddd) a unit for counting a run-length of white pixels which are adjacent in each direction starting from a white pixel existing in each divided mesh region, for a plurality of prescribed directions; (eee) a unit for calculating a direction contributivity of each direction as a value obtained by averaging the run-length in each direction by an accumulated value of all the run-lengths for all the prescribed directions as counted by the unit (ddd), for each divided mesh region; (fff) a unit for calculating a feature value of each divided mesh region by accumulating the direction contributivity of each direction for all white pixels in each divided mesh region and averaging an accumulated value of the direction contributivity of each direction by a number of white pixels within each mesh region; and (ggg) a unit for carrying out a processing for recognizing the character pattern of each normalized character using the feature values obtained for all the mesh regions at the unit (fff).
According to another aspect of the present invention there is provided an apparatus for recognizing character patterns, comprising: (aaa) a unit for dividing character patterns that are binarized into black pixels and white pixels into divided regions each containing one character; (bbb) a unit for normalizing position and size of a character in each divided region; (ccc) a unit for dividing a character pattern of each normalized character into mesh regions; (ddd) a unit for counting a run-length of black pixels which are adjacent in each direction starting from a black pixel existing in each divided mesh region, for a plurality of prescribed directions; (eee) a unit for calculating a black pixel direction contributivity of each direction as a value obtained by averaging the run-length of black pixels in each direction by an accumulated value of all the run-lengths of black pixels for all the prescribed directions as counted by the unit (ddd), for each divided mesh region; (fff) a unit for counting a run-length of white pixels which are adjacent in each direction starting from a white pixel existing in each divided mesh region, for a plurality of prescribed directions; (ggg) a unit for calculating a white pixel direction contributivity of each direction as a value obtained by averaging the run-length of white pixels in each direction by an accumulated value of all the run-lengths of white pixels for all the prescribed directions as counted by the unit (fff), for each divided mesh region; (hhh) a unit for calculating a black pixel feature value of each divided mesh region by accumulating the black pixel direction contributivity of each direction for all black pixels in each divided mesh region and averaging an accumulated value of the black pixel direction contributivity of each direction by a number of black pixels within each mesh region; (iii) a unit for calculating a white pixel feature value of each divided mesh region by accumulating the white pixel direction contributivity of each direction for all white pixels in each divided mesh region and averaging an accumulated value of the white pixel direction contributivity of each direction by a number of white pixels within each mesh region; and (jjj) a unit for carrying out a processing for recognizing the character pattern of each normalized character using the black feature values obtained for all the mesh regions at the unit (hhh) and the white feature values obtained for all the mesh regions at the unit (iii).
According to another aspect of the present invention there is provided a computer readable recording medium recording a program for causing a computer to execute processing including: (a) a process for entering each input frame constituting the video data; and (b) a process for judging whether each input frame entered at the process (a) is a telop character displaying frame in which telop characters are displayed or not, according to edge pairs detected from each input frame by detecting each two adjacent edge pixels for which intensity gradient directions are opposite on some scanning line used in judging an intensity gradient direction at each edge pixel and for which an intensity difference between said two adjacent edge pixels is within a prescribed range as one edge pair, edge pixels being pixels at which an intensity value locally changes by at least a prescribed amount with respect to a neighboring pixel among a plurality of pixels constituting each input frame.
According to another aspect of the present invention there is provided a computer readable recording medium recording a program for causing a computer to execute processing including: (aaa) a process for dividing character patterns that are binarized into black pixels and white pixels into divided regions each containing one character; (bbb) a process for normalizing position and size of a character in each divided region; (ccc) a process for dividing a character pattern of each normalized character into mesh regions; (ddd) a process for counting a run-length of white pixels which are adjacent in each direction starting from a white pixel existing in each divided mesh region, for a plurality of prescribed directions; (eee) a process for calculating a direction contributivity of each direction as a value obtained by averaging the run-length in each direction by an accumulated value of all the run-lengths for all the prescribed directions as counted by the process (ddd), for each divided mesh region; (fff) a process for calculating a feature value of each divided mesh region by accumulating the direction contributivity of each direction for all white pixels in each divided mesh region and averaging an accumulated value of the direction contributivity of each direction by a number of white pixels within each mesh region; and (ggg) a process for carrying out a processing for recognizing the character pattern of each normalized character using the feature values obtained for all the mesh regions at the process (fff).
According to another aspect of the present invention there is provided a computer readable recording medium recording a program for causing a computer to execute processing including: (aaa) a process for dividing character patterns that are binarized into black pixels and white pixels into divided regions each containing one character; (bbb) a process for normalizing position and size of a character in each divided region; (ccc) a process for dividing a character pattern of each normalized character into mesh regions; (ddd) a process for counting a run-length of black pixels which are adjacent in each direction starting from a black pixel existing in each divided mesh region, for a plurality of prescribed directions; (eee) a process for calculating a black pixel direction contributivity of each direction as a value obtained by averaging the run-length of black pixels in each direction by an accumulated value of all the run-lengths of black pixels for all the prescribed directions as counted by the process (ddd), for each divided mesh region; (fff) a process for counting a run-length of white pixels which are adjacent in each direction starting from a white pixel existing in each divided mesh region, for a plurality of prescribed directions; (ggg) a process for calculating a white pixel direction contributivity of each direction as a value obtained by averaging the run-length of white pixels in each direction by an accumulated value of all the run-lengths of white pixels for all the prescribed directions as counted by the process (fff), for each divided mesh region; (hhh) a process for calculating a black pixel feature value of each divided mesh region by accumulating the black pixel direction contributivity of each direction for all black pixels in each divided mesh region and averaging an accumulated value of the black pixel direction contributivity of each direction by a number of black pixels within each mesh region; (iii) a process for calculating a white pixel feature value of each divided mesh region by accumulating the white pixel direction contributivity of each direction for all white pixels in each divided mesh region and averaging an accumulated value of the white pixel direction contributivity of each direction by a number of white pixels within each mesh region; and (jjj) a process for carrying out a processing for recognizing the character pattern of each normalized character using the black feature values obtained for all the mesh regions at the process (hhh) and the white feature values obtained for all the mesh regions at the process (iii).
Other features and advantages of the present invention will become apparent from the following description taken in conjunction with the accompanying drawings.