Field of the Invention
One disclosed aspect of the embodiments relates to a data processing apparatus, a data processing method, and a program for transferring image data before and after image processing.
Description of the Related Art
Generally, when image data formed by image forming processing is output, local (neighborhood) image processing such as spatial filtering is performed. In the local image processing, a predetermined calculation is performed on a spatial filter area including a pixel as a processing target (hereinbelow, referred to as a processing object pixel) using all or most of pixels in the spatial filter area.
Japanese Patent Application Laid-Open No. 2006-139606 describes a technique for performing such local image processing as parallel distributed processing. According to the technique, an image is subjected to region segmentation (one-dimensional segmentation) with respect to a sub-scanning direction, and image processing is performed on the segmented area sequentially or in a parallel distributed manner. A method for processing an image for each one-dimensional segmented area as described above is referred to as band processing.
Generally, in the band processing, each band area is designed to partially overlap with each other at a boundary with an adjacent area so that the local image processing is performed between the band areas without a gap. According to Japanese Patent Application Laid-Open No. 2006-139606, when the local image processing is performed on a band area, a pixel is scanned one by one in a height direction of the band area, so that a delay memory capacity for storing pixels necessary for the local image processing is defined by a height size of the band area. Accordingly, reduction of the delay memory is realized in Japanese Patent Application Laid-Open No. 2006-139606.
In addition, there is tile processing for processing an image by performing two-dimensional region segmentation and others, and methods for segmenting an image into partial images and performing processing sequentially or in a parallel distributed manner are collectively referred to as region segmentation methods.
As described above, the region segmentation processing for performing the sequential processing or the parallel distributed processing by segmenting an image into partial images has various advantages, such as speeding up by the parallel distributed processing and reduction of the delay memory.
One of the above-described local image processing is resolution conversion processing for converting an input image into an image of a desired size by enlarging or reducing the image. According to the resolution conversion processing, the number of input pixels is largely different from the number of output pixels after processing, therefore it is difficult to implement the processing as hardware compared to other local image processing in which the number of pixels is not changed in input and output. As described above, using the region segmentation processing has various advantages, however, when the image processing such as the resolution conversion is realized by the region segmentation processing, it becomes more difficult to implement the processing as hardware.
For example, assuming that image data on a sheet surface or the like is segmented into areas (for example, bands), and the resolution conversion (arbitrary scaling) is performed on each of the segmented image areas. When a magnification (variable magnification) of the resolution conversion is an integer, a converted image area is also an integer multiple, and it is not necessary to consider a pixel of which a size is less than one pixel and below a decimal point, in other words, a phase shift by the resolution conversion. On the other hand, when the magnification (variable magnification) is not an integer, a pixel of which a size is below a decimal point may be generated in an image area after the resolution conversion depending on a size of an input image area. However, when the processed image area data is written into an external storage device (a global buffer), a pixel less than one pixel cannot be written. Needless to say, it is necessary to “round” the pixel less than one pixel using a method for writing the relevant pixel as one pixel, a method for not writing by ignoring it, or the like so that the number of pixels becomes an integer.
As described above, when a pixel less than one pixel generated by the image processing is rounded, a setting value of a direct memory access (DMA) function used for transfer of image data is varied in each segmented area. Thus, the same setting value cannot be continuously used. For example, setting values such as a “top address” and a “repeat count” necessary when image data is transferred using the DMA function are required to be calculated in consideration of a series of image processing contents for each segmented area. Conventionally, in such a case, setting values of the DMA function corresponding to a series of image processing contents for each segmented area are first calculated by firmware operating on a central processing unit (CPU). Subsequently, the CPU sequentially sets the setting values calculated for each segmented area and sequentially operates an image processing unit.
For example, when the image processing on front and rear surfaces of a sheet is realized by time division multiplex processing by a single image processing apparatus as in the case of simultaneous two-sided scanning, image data pieces of the front and rear surfaces are necessary to be segmented into areas (for example, bands) and processed alternately. In such usage, for the setting value of the above-described DMA function, it is necessary to calculate a setting value of a front area and a setting value of a rear area alternately.
Generally, an image processing apparatus is constituted of various units such as a CPU for controlling the entire apparatus, an external storage device for storing a pixel as a processing target, a direct memory access controller (DMAC) for transferring an image in a unit of processing, and an image processing unit for executing image processing. These units operate simultaneously and/or in cooperation with each other and thus realize desired image processing at desired speed. When image data of an image processing target is input to the image processing apparatus, the image data of a part of the processing target is sequentially read from the external storage device and temporarily stored in an input local buffer in the image processing unit, and the image processing is executed. Further, when processed image data is output, a part of the image data after the image processing is temporarily stored in an output local buffer in the image processing unit and sequentially written into the external storage device. As described above, the image processing apparatus transfers image data to each other between the external storage device via the input/output local buffers and realizes simultaneous operations of the units and linkage operations between the units in the apparatus.
The CPU needs to control the DMAC in a unit of a partial image while cooperating (synchronizing) with another device and input the partial image to the image processing unit. For example, when a scanner device performs image segmentation on an A4 size sheet with a resolution of 600 dpi (6600 pixels in a vertical direction) at a band area having a height of 16 pixels, a number of the band areas will be 412. In other words, the CPU needs to perform the DMAC control for 412 times per page. In addition, similar control is necessary for the output from the image processing unit, and the control time by the CPU will be 824 times. As the number of segmenting times becomes larger, a control load on the CPU becomes heavier, and overhead such as a delay in communication for cooperating (synchronizing) with another device and a delay in an interrupt becomes larger. Accordingly, it becomes difficult to realize speeding up (real-time characteristic) of the image processing apparatus.
In order to address such issues, it is necessary to reduce overhead caused by cooperation (synchronization) with another device conventionally performed by the CPU and a processing load of the control on the DMAC. For example, according to Japanese Patent Application Laid-Open No. 2010-282429, the image processing unit reads out a command list from an external memory and realizes autonomous DMAC control using the command list without intervention of the CPU. On the other hand, according to Japanese Patent Application Laid-Open No. 2011-101093, a command list for starting the image processing unit while cooperating (synchronizing) with another device is introduced, and reduction of overhead for cooperation (synchronization) is realized by starting the DMAC.
Japanese Patent Application Laid-Open No. 2011-101093 is also similar, the DMAC and the image processing unit are separated in the general image processing apparatus. Further, synchronization between the image processing unit (an image input output unit and an image processing execution unit) and external devices (the CPU, the scanner, and the video input device) is realized via the DMAC with completion of DMA transfer as a starting point. Thus, it is necessary to start the DMAC for each control point (a synchronization point) with the external devices (the CPU, the scanner, and the video input device). Needless to say, the control point (the synchronization point) is located according to a transfer amount (a unit of transfer) of the DMAC.
On the other hand, the image processing unit (the image input output unit and the image processing execution unit) is operated according to an image processing data flow. In this regard, the control points (the synchronization points) of task switching, barrier synchronization, and the like by the image processing execution unit according to the image processing data flow do not always coincide with the control point (the synchronization point) of the above-described transfer completion of the DMAC. Thus, in order to appropriately perform synchronization between the external devices (the CPU and scanner) and the image processing unit (the image input output unit and the image processing execution unit), the transfer amount of the DMAC is segmented into a smaller size. Further, it is necessary to make it easily to adjust the control point (the synchronization point) of the DMA transfer completion to the control point (the synchronization point) of the task switching and the barrier synchronization of the image processing However, when a unit of DMA transfer is made smaller, DMA transfer efficiency is decreased. In addition, the external device (the CPU) controls the DMAC for each control point (the synchronization point) of the DMA transfer completion. Accordingly, an issue arises such that a control load on the external device (the CPU) is increased because the number of the above-described control points (the synchronization points) is increased.
Especially, in the usage like the simultaneous two-sided scanning, an installation position of a scanner sensor with respect to a sheet surface is different between the front and rear surfaces of the sheet, and a reading start position of the image processing is different between the front and rear surfaces. The same manner is applied when designation of a reading range is different between the front and rear surfaces of the sheet. Further, if an image format (a dot sequential system, a frame sequential system, or the number of colors) for performing the image processing is different between the front and rear surfaces of the sheet, it is to be understood that the image processing needs to be executed on the front and rear surfaces of the sheet using respectively appropriate setting values. In order to realize the image processing on the front and rear surfaces in a time division multiplexed manner by a single image processing apparatus with respect to these purposes, it is necessary to alternately switch the image processing for each segmented area on the front and rear surfaces. On the other hand, as in the case of the above-described arbitrary scaling, the setting value of the DMA function used for transfer of the image data is varied in each segmented area, and the same setting value cannot be continuously used. Therefore, it is further difficult to solve the two issues at the same time.
The image processing unit operates a series of image processing for each band while synchronizing with external devices (the CPU, the scanner device, a print device, and the like) every time performing the image processing on one band area. At that time, the image processing unit is in a wait state from when an external synchronization signal is input from the external device until when the image processing on a next band area is started. However, during the wait state, a clock is input to the image processing execution unit, and the image processing execution unit consumes needless electricity. When the external device is a hardware device such as a scanner, a wait time between the bands is too short for the firmware to intervene. Therefore, when a clock is stopped by intervention of the firmware, control overhead is generated, and a processing time of the image processing (in whole) is delayed. On the other hand, if the band areas are processed 412 times in one page of an A4 size sheet and, an automatic document feeder (ADF) scanner is continuously operated for 50 pages, the wait states are generated more than 20000 times, and needless power consumption thereof is large in total.
The above-described conventional techniques have no descriptions of clock control in the wait state which is too short for the firmware to intervene between the band area subjected to the image processing and a next band area.
As described above, when the setting value of the DMA function for each segmented area is calculated by the firmware, processing by the CPU is required for each segmented area, so that a plurality of segmented areas cannot be continuously processed in the image processing unit.
In addition, when operations of a plurality of units are controlled by controlling data transfer to a global buffer via a local buffer, and if the number of pixels of input and output images varies for each segmented area as in the case of the resolution conversion processing, the data transfer cannot be uniformly defined. Therefore, for example, the number of output pixels is calculated by the firmware for each segmented area, and the image processing unit is started by changing the setting of the DMAC for each band by the firmware. Further, when the image processing for one band is completed, the image processing unit needs to notify the CPU of completion of the processing using an interrupt and the like to synchronize with the CPU. The CPU takes time for, for example, a few millisecond to receive the interrupt, search for an interrupt factor, and return to a next operation. Accordingly, the CPU is in a state in which a certain load is always applied thereon for controlling and synchronizing (meeting) with the image processing unit and is not released during the image processing. Therefore, there are issues such as a delay is required for a simultaneous operation of each unit in the apparatus, and a linkage operation between the units cannot be continuously executed.