The present invention relates to a data sheet identification device that can be suitably used for a data sheet processing in a financial institution and others.
In recent years, there has been developed a data sheet identification device as a device for identifying a data sheet (a medium exclusively used for a recognition processing) based on a process of reading information on the data sheet as optical image information, processing the read image and then identifying the data sheet. This data sheet identification device has now been widely used by various industries to improve their operation efficiency.
In a financial institution and a like industry, operators at windows are processing data sheets by using data sheet identification devices. In order to improve the work efficiency of data sheet processing, it has been required that one data sheet identification device can automatically process data sheets having various kinds of formats, not only that the data sheet identification device can process a large volume of data sheets of the same kind. As a data sheet identification device that meets this requirement, there has been a data sheet identification device that catches ruled lines printed on a data sheet as a feature for identifying the data sheet. The data sheet identification device that catches the ruled lines as the feature of the data sheet has been disclosed in PCT International Patent Publication No. WO97/05561.
According to the data sheet identification device disclosed in the above publication, a data sheet X shown in FIG. 35A is discriminated from a data sheet Xxe2x80x2 shown in FIG. 35C based on a difference between ruled lines printed on both data sheets. In this example, an oval Ka portion is different from an oval Kb portion between the data sheet X and the data sheet Xxe2x80x2. In other words, the data sheet X is different from the data sheet Xxe2x80x2 in that while a ruled line does not exist at the oval Ka portion in the data sheet X, a ruled line exists at the oval Kb portion in the data sheet Xxe2x80x2.
The operation of identifying the data sheet X shown in FIG. 35A will be explained next. First, the data sheet identification device optically reads an image (ruled lines, characters, graphics) printed on the data sheet X, and obtains image information. Then, the data sheet identification device processes the image information to extract only ruled-line information Xk shown in FIG. 35B. Next, the data sheet identification device collates the ruled-line information Xk with a database relating to ruled-line information of various data sheets, and identifies the data sheet X from among these various data sheets.
Similarly, for identifying the data sheet Xxe2x80x2 shown in FIG. 35C, the data sheet identification device optically reads an image (ruled lines, characters, graphics) printed on the data sheet Xxe2x80x2, and obtains image information. Then, the data sheet identification device processes the image information to extract only ruled-line information Xkxe2x80x2 shown in FIG. 35D. Next, the data sheet identification device collates the ruled-line information Xkxe2x80x2 with the database relating to ruled-line information of various data sheets, and identifies the data sheet Xxe2x80x2 from among these various data sheets.
In this case, the ruled-line information Xk is different from the ruled-line information Xkxe2x80x2 in that an oval Kaxe2x80x2 portion is different from an oval Kbxe2x80x2 portion. In other words, while a ruled line does not exist at the oval Kaxe2x80x2 portion in the data sheet Xk, a ruled line exists at the oval Kbxe2x80x2 portion in the data sheet Xkxe2x80x2. Therefore, the data sheet identification device recognizes that the data sheet X and the data sheet Xxe2x80x2 are different kinds of data sheets.
As explained above, according to the conventional data sheet identification device (PCT International Patent Publication No. WO97/05561), the data sheet identification device identifies data sheets based on ruled lines. Therefore, when the printing precision is poor on a certain data sheet, there has been a problem that this data sheet is identified by error as the same kind of data sheet as the other data sheet although they are actually different kinds of data sheets.
As a specific example, when the data sheet Xxe2x80x2 shown in FIG. 35C has been printed in a state that the ruled line of the oval Kb portion has been blurred and dropped, the data sheet identification device obtains the ruled-line information Xkxe2x80x2 shown in FIG. 35D in a state that the ruled-line information at the oval Kbxe2x80x2 portion has been dropped. In other words, the data sheet identification device recognizes the ruled-line information Xkxe2x80x2 as the ruled-line information Xk (FIG. 35B) that is actually different from the ruled-line information Xkxe2x80x2. As aresult, the data sheet identification device recognizes by error that the data sheet X and the data sheet Xxe2x80x2 are of the same of data sheets.
Further, in financial institutions and others, data sheets are also identified based on a difference between data sheet identification codes printed on data sheets, instead of based on a difference between formats like ruled lines printed on data sheets. The operation of identifying data sheets 1000A to 1000C shown in FIG. 36A to FIG. 36C based on data sheet identification codes will be explained next. In this case, a data sheet identification code is a 10-digit code of xe2x80x9ccustomer codexe2x80x9d.
A data sheet identification code of the data sheet 1000A is xe2x80x9c1234567890xe2x80x9d, and a data sheet identification code of the data sheet 1000B is xe2x80x9c1234567890xe2x80x9d which is the same as the data sheet identification code of the data sheet 1000A. On the other hand, a data sheet identification code of the data sheet 1000C is xe2x80x9c9876543210xe2x80x9d which is different from the data sheet identification codes of the data sheet 1000A and the data sheet 1000B. Therefore, in the financial institutions, the data sheet 1000A and the data sheet 1000B are handled as the same data sheets because of the same data sheet identification code.
However, among the data sheet 1000A to the data sheet 1000C, the ruled lines of the data sheet 1000A are the same as the ruled lies of the data sheet 1000C, and the ruled lines of the data sheet 1000A and the data sheet 1000C are different from the ruled lines of the data sheet 1000B. Therefore, according to the conventional data sheet identification device, there has been a problem that the data sheet 1000A and the data sheet 1000C are identified as the same data sheets by error because of their same ruled lines although the data sheet 1000A and the data sheet 1000B should actually be handled as the same data sheets.
It is an object of the present invention to provide a data sheet identification device having improved identification precision.
In order to achieve the above object, according to a first aspect of the present invention, there is provided a data sheet identification device comprising: character/graphics extracting unit (corresponding to a character/graphics extracting section 50 in a first embodiment to be described later) for extracting characters (including character strings) and graphics from image information of a data sheet that has been read by image reading unit; identical shape deciding unit (corresponding to a identical shape deciding section 60 in the first embodiment to be described later) for deciding whether or not there exist a plurality of characters and graphics having the same shape among a plurality of characters and graphics that have been extracted by the character/graphics extracting unit; graphic collating unit (corresponding to a graphics collating section 80 in the first embodiment to be described later) for collating graphics that have been decided to have the same shape with a graphic database in which a plurality of graphics showing features of a plurality of data sheets respectively have been registered; character collating unit (corresponding to an identification code/data sheet ID identifying section 150 in the first embodiment to be described later) for collating characters that have been decided to have the same shape with a character database in which a plurality of characters showing features of a plurality of data sheets respectively have been registered; and identifying unit (corresponding to an identifying section 230 in the first embodiment to be described later) for uniquely identifying the data sheets based on a result of the collation by the graphic collating unit and a result of the collation by the character collating unit.
According to the above aspect, when a plurality of characters and graphics have been extracted by the character/graphics extracting unit, the identical shape deciding unit makes a decision as to whether or not there exist a plurality of characters and graphics that have the same shape among these characters and graphics. Thus, the graphic collating unit collates the graphic database with the graphics that have been decided to have the same shape. In parallel with this operation, the character collating unit collates the character database with the characters that have been decided to have the same shape. The identifying unit uniquely identifies the data sheets based on a result of the collation by the graphic collating unit and a result of the collation by the character collating unit.
As described above, according to the first aspect, the data sheets are identified uniquely based on the result of the collation relating to graphics and characters that have been decided to have the same shapes respectively. Therefore, it is possible to correctly identify data sheets that are otherwise erroneously identified by the conventional identification method based on a result of the collation relating to ruled lines. As a result, it is possible to improve the precision of identification.
Further, according to a second aspect of the invention, there is provided a data sheet identification device comprising: character/graphics extracting unit (corresponding to a character/graphics extracting section 50 in a second embodiment to be described later) for extracting characters (including character strings) and graphics from image information of a data sheet that has been read by image reading unit; identical shape deciding unit (corresponding to a identical shape deciding section 60 in the second embodiment to be described later) for deciding whether or not there exist a plurality of graphics having the same shape among a plurality of graphics that have been extracted by the character/graphics extracting unit; graphic collating unit (corresponding to a graphics collating section 80 in the second embodiment to be described later) for collating graphics that have been decided to have the same shape with a graphic database in which a plurality of graphics showing features of a plurality of data sheets respectively have been registered; identical character deciding unit (corresponding to an identical character string deciding section 310 in the second embodiment to be described later) for deciding whether or not there exist a plurality of the same characters among a plurality of characters that have been extracted by the character/graphics extracting unit; character collating unit (corresponding to an identification code/data sheet ID identifying section 150 in the second embodiment to be described later) for collating characters that have been decided to be the same with a character database in which a plurality of characters showing features of a plurality of data sheets respectively have been registered; and identifying unit (corresponding to an identifying section 230 in the second embodiment to be described later) for uniquely identifying the data sheets based on a result of the collation by the graphic collating unit and a result of the collation by the character collating unit.
According to the above aspect, when a plurality of characters and graphics have been extracted by the character/graphics extracting unit, the identical shape deciding unit makes a decision as to whether or not there exist a plurality of graphics that have the same shape among these graphics. Thus, the graphic collating unit collates the graphic database with the graphics that have been decided to have the same shape. In parallel with this operation, the identical character deciding unit makes a decision as to whether or not there exist a plurality of the same characters among the plurality of characters that have been extracted by the character/graphics extracting unit. Thus, the character collating unit collates the character database with the characters that have been decided to be the same. The identifying unit uniquely identifies the data sheets based on a result of the collation by the graphic collating unit and a result of the collation by the character collating unit.
As described above, according to the second aspect, the data sheets are identified uniquely based on the result of the collation relating to graphics that have been decided to have the same shape and the characters that have been decided to be the same. Therefore, it is possible to correctly identify data sheets that are otherwise erroneously identified by the conventional identification method based on a result of the collation relating to ruled lines. As a result, it is possible to improve the precision of identification.
Further, according to a third aspect of the invention, there is provided a data sheet identification device of the first or second aspect, wherein the character/graphics extracting unit extracts the characters and graphics from a result of image information of which image has been adjusted.
According to the above aspect, image information is adjusted such as, for example, noise is removed from the image information. Characters and graphics are then extracted from a result of this image adjustment. Therefore, it is possible to further improve the identification precision of the data sheets without receiving an influence of noise.
Further, according to a fourth aspect of the invention, there is provided a data sheet identification device of the first aspect, wherein the identical shape deciding unit makes a decision about characters and graphics of the same shape based on a result of a correction including at least a rotation, an expansion and a contraction carried out for those which are to be compared among the plurality of characters (including character strings) and graphics that have been extracted by the character/graphics extracting unit.
According to the above aspect, a correction including a rotation, an expansion and a contraction is carried out for characters and graphics that are to be compared among those that have been extracted. Therefore, it is possible to avoid an influence of image deterioration that is generated at the time of reading data sheets.
Further, according to a fifth aspect of the invention, there is provided a data sheet identification device of the first aspect, wherein the identical shape deciding unit makes a decision about whether or not there exist a plurality of characters and graphics that are at least partly in the same shape among a plurality of characters (including character strings) and graphics that have been extracted by the character/graphics extracting unit.
According to the above aspect, a method of deciding the same shape can also cover characters and graphics that are partly in the same shape. Therefore, it is possible to correctly identify the data sheets even if characters ad graphics have been partly damaged or lost due to the deterioration of the image.
Further, according to a sixth aspect of the invention, there is provided a data sheet identification device of any one of the first to fifth aspects, the data sheet identification device further comprising: ruled-line extracting unit (corresponding to a ruled-line extracting section 170 in the first embodiment to be described later) for extracting ruled lines from the image information; and ruled-line collating unit (corresponding to a ruled-line collating section 180 in the first embodiment to be described later) for collating ruled-lines that have been extracted by the ruled-line extracting unit with a ruled-line database in which a plurality of ruled lines showing features of a plurality of data sheets respectively have been registered, wherein the identifying unit uniquely identifies the data sheets based on a result of the collation by the graphic collating unit, a result of the collation by the character collating unit, and a result of the collation by the ruled-line collating unit.
According to the above aspect, data sheets are identified by also taking into account a result of the collation relating to ruled lines. Therefore, it is possible to correctly identify data sheets based on a result of the collation relating to ruled lines even if it is not possible to identify the data sheets based on a result of the collation relating to characters and graphics.
Further, according to a seventh aspect of the invention, there is provided a data sheet identification device of the sixth aspect, the data sheet identification device further comprising: plane information extracting unit (corresponding to a plane extracting section 200 in the first embodiment to be described later) for extracting plane information including at least a filled area and a meshed area from the image information; and plane information collating unit (corresponding to a plane collating section 210 in the first embodiment to be described later) for collating plane information that has been extracted by the plane information extracting unit with plane information database in which a plurality of pieces of plane information showing features of a plurality of data sheets respectively have been registered, wherein the identifying unit uniquely identifies the data sheets based on a result of the collation by the graphic collating unit, a result of the collation by the character collating unit, a result of the collation by the ruled-line collating unit, and a result of the collation by the plane information collating unit.
According to the above aspect, data sheets are identified by also taking into account a result of the collation relating to plane information. Therefore, it is possible to correctly identify data sheets based on a result of the collation relating to the plane information even if it is not possible to identify the data sheets based on a result of the collation relating to characters and graphics.
Further, according to an eighth aspect of the invention, there is provided a data sheet identification device of the sixth or seventh aspect, wherein the identifying unit selects one of a plurality of results of collation according to a predetermined priority order, and uniquely identifies the data sheets based on the selected result of the collation.
According to the above aspect, a priority order is applied to a plurality of results of collation, and the data sheets are uniquely identified from a result of the collation based on the priority order. Therefore, it is possible to increase the variation in identification of data sheets.
Further, according to a ninth aspect of the invention, there is provided a data sheet identification device of any one of the first to eighth aspects, wherein the graphic collating unit collates graphics that have been decided to have the same shape and position information of the graphics with a graphic database in which a plurality of graphics and position information of the graphics showing features of a plurality of data sheets respectively have been registered.
According to the above aspect, graphics are collated by also taking into account position information of graphics. Therefore, it is possible to avoid an erroneous identification of data sheets due to a difference in position.
Further, according to a tenth aspect of the invention, there is provided a data sheet identification device of any one of the first and third to ninth aspects, wherein the character collating unit collates characters that have been decided to have the same shape and position information of the characters with a character database in which a plurality of characters and position information of the characters showing features of a plurality of data sheets respectively have been registered.
According to the above aspect, characters are collated by also taking into account position information of characters. Therefore, it is possible to avoid an erroneous identification of data sheets due to a difference in position.
Further, according to an eleventh aspect of the invention, there is provided a data sheet identification device of any one of the first and third to tenth aspects, wherein the character collating unit collates characters that have been decided to have the same shape and font information of the characters with a character database in which a plurality of characters and font information of the characters showing features of a plurality of data sheets respectively have been registered.
According to the above aspect, characters are collated by also taking into account font information of characters. Therefore, it is possible to avoid an erroneous identification of data sheets due to a difference in font information.
Further, according to a twelfth aspect of the invention, there is provided a data sheet identification device of any one of the first to eleventh aspects, wherein the character/graphics extracting unit extracts from the image information a part pattern in which pixels constituting a straight line portion of a ruled line and pixels constituting the characters are connected, and separates the straight line portion from the characters based on the part pattern, thereby to extract the characters.
According to the above aspect, even if characters exist on a ruled line, only the characters are extracted without affecting the ruled line. Therefore, it is possible to further increase the identification precision of the data sheets.
Other objects and features of this invention will become understood from the following description with reference to the accompanying drawings.