The digitalization of information has resulted in a development of ways for storing and visualizing graphical information. The storing and visualization of graphical information i.e. images in a two dimensional space is based on two main types of graphical information, the first of which is called vector graphics and the second of which is called raster graphics (also known as bitmap).
The vector graphics is a type of computer graphics, which is based on an idea in which objects are tied into coordinates. The objects are for example lines, polygons, circles and other geometrical shapes. The characteristics and shapes of the objects are described as coordinates and mathematical expressions, such as mathematical formulas. For example, if a circle is to be created with vector graphics, an indication that a circle is to be created is needed, a location of the center of the circle in a coordinate system is defined as well as the radius of the circle. Additionally, the line style and a color and a fill style and a color are to be defined in the formula. By modifying the values in the formula the object can be re-shaped. Images based on the vector graphics are well-suited for showing ordered, exact and constructed items like text and diagrams. The images expressed with the means of vector graphics are called vector images.
In the raster graphics, in turn, the graphical information is expressed as a regular grid of colored points. More specifically, the raster image is composed from picture elements i.e. pixels, which are arranged in a grid structure, which is typically a rectangle. Each picture element has a predefined location in the grid and the color of the picture element is defined with color values. A raster image expressed as an array of data where each point occupies one array element is called an uncompressed raster image. The main benefit of this representation is that each point in the raster can be directly accessed by accessing the corresponding array element. The main drawback is that when that array is stored or transferred, the amount of raw data to be handled may be too much for the corresponding medium. The manipulation of the raster graphics image is possible in multiple ways, e.g. by adding or removing the picture elements to or from the image or optimizing the characteristics of the picture elements according to the needs. By means of the manipulation it is possible to affect the size of the image i.e. how much memory space is needed in storing the image data. Generally speaking, raster images are well-suited for showing disordered, approximate, and natural items like landscapes and physical objects typically originating from photos or scans.
In the context of raster images the density of the raster grid points is measured with resolution. The resolution indicates how many points can fit inside given area or along given line. Resolution is typically expressed as dots-per-inch (dpi), which literally tells how many points can fit on a line one inch long. The resolution is an important issue as regards to the manipulation and visualization of digital images in the raster graphics. Depending on the need the resolution may be increased or decreased. However, the resolution is not an issue in vector graphics due to the fact that as the content is expressed with mathematical formulas, the change of resolution does not have any effect on the image objects. Thus, vector images are often called as resolution independent images. On the contrary, the change of resolution in raster images significantly affects to the visual quality of the images, as already explained. Raster images are often called as resolution dependent images.
Both image types are used in storing and visualizing of graphical information. The vast majority of modern display devices are so called raster displays that show a regular grid of colored points. Although the arrangement and shape of these points vary, the concepts of density and resolution that apply to raster images apply directly to raster display devices, too. Raster displays can display both vector and raster graphics, but the process required doing so is different. The process of displaying vector graphics on a raster display is called scan line conversion. It resolves where freeform vector graphic shapes intersect with raster display points arranged into straight lines, and what the color at each such intersection point is. The process of displaying raster graphics on a raster display is called raster resampling. It resolves which colors to apply on each raster display point by interpolating the colors between the colors at neighboring raster image points. An important resampling special case is when raster image and raster display resolution and point arrangement match perfectly. Resampling operation is then reduced to a simple copying operation.
In order to display any visual content on screens of computing devices, or to store it into a bitmap image file, the content goes thru scan line conversion (vector graphics) or raster resampling (raster graphics) steps, as disclosed above. These steps transform visual content so that they can be rasterized into the raster canvas having a certain pixel density (resolution). With the term ‘raster canvas’ it is meant a raster image kept and maintained in uncompressed form in a memory of a computing unit, such as a computer. The visual contents in the raster canvas can then be displayed on screen, or stored into a bitmap image file. When storing the data into some memory structure or a file, different data compression schemes may be applied to optimize the resulting bitmap image file size. There are various ways to compress bitmap data, and thus influence quality/size ratio of the resulting bitmap image file. It is often desirable to minimize the image file size, just to make it more affordable and practical to store, or to transmit to some remote device. The time it takes to transmit a file is dependent on the file size, and in slow network connections like mobile networks, this has direct impact on the overall user experience—how long the user needs to wait in order to see a certain image file.
There are various compression methods available in order to compress image data to meet the needs of transferring and storing information. The compression methods can be divided into two categories. The first category covers so called lossless data compression methods, used in popular bitmap file formats like PNG and TIFF. Lossless compression methods seek to preserve data as it is, and in case of diverse, colorful image content the resulting image file size is often quite big. Thus prior using lossless compression methods it is often practical to apply pre-processing steps in order to optimize image data. This happens by for example reducing the number of colors used in the image (quantizing color map) or by getting rid of some details by reducing the image resolution. If these pre-processing steps are done appropriately, such pre-processing steps can be almost unnoticeable to the human eye. Generally speaking, lossless compression tends to work better with images which use limited number of colors, and contain large solid color areas.
The second compression method category covers so called lossy data compression methods, used in such bitmap file format as JPEG. Lossy compression methods, which are often used for compressing colorful, digitized images like photos or scans. In such cases lossy compression algorithms offer ways to balance image quality with resulting bitmap image file size. With high compression ratio lossy compression algorithms introduce various compression artifacts into the output image. These are especially visible in solid color areas, and also near areas which contain high contrast changes between nearby colors. Therefore, in case the image contains small details to be preserved, only low compression ratios can be used with lossy compression algorithms. Otherwise lossy compression will result losing some details, introduce visible compression artifacts, and significant loss of overall image quality observed. Compression artifacts typically smudge areas that should be uniformly and smoothly colored, and blur edges and fine details of small items. This often significantly affects printed text because letters and other symbols are typically uniformly colored and consists of many sharp edges and other fine details. Lossy compression methods, although more effective in reducing the amount of data required to express the images, are therefore not well-suited for compressing raster images containing line drawings and text. Photo raster images typically contain less uniformly colored areas or sharp edges; therefore they are far less in danger of being affected by compression artifacts. Lossy compression methods are therefore well suited for compressing photos and similar.
As can be concluded from the description above, in order to maintain the visual quality of a raster image, the compression method to be applied is to be chosen at least partly in accordance with types of content rendered into the raster canvas. For example, if a lossy compression method is heavily applied for textual content, the compression affects to the sharpness and clarity, and thus reducing the readability of the textual content. Similarly, if a lossless compression method is applied for example to a color photo, the resulting file size remains large due to the amount of details preserved. In fact, with photos there may not be any specific need for such detailed presentation. The challenge arises if the image into which compression is to be applied contains different kinds of rasterized content, originating from different source types. For example, the image may consist of color photo, vector graphics elements and rendered text. If a compression method, selected from either of the compression method categories as described, is applied to such diverse image content, the end result is not optimal. Either the visual quality drops to an unacceptable level or the image file size remains too large to be practical for storing and transferring purposes.
The challenge as described above is often present when the content in the raster canvas is very diverse, for example consisting of color photos, vector drawings and rendered text. This happens because prior saving the contents of the raster canvas into the raster image file, a compression method is applied over the entire raster canvas area. While both lossy and lossless compression approaches have their pros and cons, the diverse visual content often found in typical document pages represent the kind of content which is extremely difficult to compress well while maintaining desired visual quality, observed as text readability, image quality, sharpness and preservation of details.
A specifically challenging occasion for the above described situation is when a document with both vector and raster content gets paginated, i.e. the content is divided into one or more discrete pages and the content in resulting document pages is converted into raster image form for convenient storage, transfer and display.