1. Technical Field
The present invention generally relates to buffered reading of data. More particularly, the invention relates to improving runtime performance by automatically obtaining a predetermined value for an amount of data to place in a memory buffer. In this way, the buffer is filled only to an extent needed based on anticipated read requests, instead of automatically fully filling the buffer.
2. Background Information
The conventional buffered reader 10, shown in FIG. 1, is a recurring abstraction implemented as part of the standard library of numerous modern object-oriented programming languages. Table 1 shows examples of the fully qualified class names of the conventional buffered reader in various programming languages:
TABLE 1Fully Qualified Buffered Reader Class Names in VariousProgramming LanguagesLanguageConventional Buffered Reader Class NamePythonio.BufferedReaderC++iostream library, class filebufC#System.IO.BufferedStreamJava(1) java.io.BufferedReader(2) java.io.BufferedInputStream
In this document, the term buffered reader refers to a general object-oriented pattern, of which the implementations listed in Table 1 are examples. A buffered reader, as its name suggests, provides buffered reading of an underlying data source 12. See FIG. 1. The data source may be a physical device like secondary storage (file on disk), a network card, random access memory, or any other data access device that can be interfaced with an operating system.
In buffered reading, the buffered reader fills a fixed size main memory input buffer to hold the data needed to satisfy client read requests. The data type of the input buffer is usually (but not necessarily) array of character or array of byte. An element of the input buffer array is referred to hereafter as an input buffer element.
Each input buffer fill operation is typically accomplished through a single operating system (OS) read system call. Depending upon the specific underlying data source, a read system call can be relatively costly in terms of runtime performance. Fortunately, once the main memory input buffer is filled, read requests made by the application are satisfied rapidly.
A disadvantage of conventional buffered reading is that the buffered reader does not know how much data the client application will need to read when the buffered reader fills its internal buffer. This results in some wasted data transfer from the operating system to the buffered reader.
Buffered reading provides a nice separation between the number of read requests issued by an application and the number of OS read system calls that are required. If the application makes numerous small read requests, as would be required to deserialize an object consisting of numerous small attributes, for example, the read requests can usually be satisfied from the input buffer 14 with only occasional read system calls to refill the buffer.
Buffered readers typically expose through their public interface the concept of the current logical position, or just current position. This is the current logical offset within the data being read, conceptualizing the source data as an array of the input buffer element type. The current logical position tends to lag the current physical position within the data being read, as the buffered reader reads ahead to fill its buffer. A buffered reader automatically maintains the current logical and physical positions and refills its main memory input buffer as necessary to satisfy client requests.
A survey of the common kinds of methods found on the buffered reader classes mentioned in Table 1 above follows.
Construction and Setup
Typically, the constructor takes a file, input stream, or file reader object and an optional buffer size. If a buffer size is not provided, then a default is typically assumed. In C++, for example, the filebuf has a parameterless constructor; however, there are setter methods for setting the name of the file to open and installing the buffer to use.
Examining Capability and State
There are numerous methods for examining a buffered reader's capability and state. Some of the more common methods of this type answer the following questions: Is reading supported? Is seeking supported? Is there data available to read? How much? There is considerable variability that we will not cover here, as these details are immaterial for describing the present invention.
Seeking
Seeking is used to change the buffered reader's current position to begin reading at an offset relative to the current position, start of data, or end of data. Seeking lets the user move forwards or backwards in the data, providing random access.
Skipping is a restricted form of seeking, where the user can fast forward through the data by a specified distance.
In some implementations, both skipping and seeking are supported. In other implementations, only one or the other is supported. Seeking or skipping a distance greater than what is covered by the input buffer will invalidate the input buffer, and the buffer will be refilled when reading resumes at the new location.
Reading
Read methods are used to copy data from the buffered reader to the application that is using it. Read methods advance the current position. There is a great variety of read methods that return everything from a single byte to sequences of bytes to numerous kinds of strong types. Read methods may be strongly typed, doing type conversions from data in the input buffer.
Some conventional buffered reader implementations support a read method that copies data from the buffered reader directly into a buffer provided by the client application. With a large client-supplied buffer, such a read method allows a client to make a large read request. In fact, the read request may be much larger than the buffered reader's own internal buffer. In this case, an efficient buffered reader implementation will bypass its internal buffering mechanism and issue a large read system call that writes directly into the client's buffer. Disadvantages of this approach, from an object-oriented point of view, relate to broken encapsulation of both buffering and of access to the underlying data. To use this type of read method, the client application must allocate and manage its own buffer rather than encapsulating buffering within the buffered reader. And once the client's buffer is filled, the client is responsible for handling any deserialization required to structure the data in its buffer, rather than encapsulating deserialization as part of a strongly typed getter interface on the buffered reader itself. Some implementations also support an “unget” method for putting data back and rewinding the current position if the application has read too far.
Peeking
Peek methods, like read methods, return data to the application but they do not advance the current position.
Close
A buffered reader typically has a close( )method to close the underlying input stream or file reader by which the buffered reader acquires the data to fill its input buffer.
Other
Some buffered reader implementations provide methods that truncate the data or rewrite the data. This invention focuses on reading, so these methods are not important for this discussion.