Data streaming permits data to be obtained from storage on an as needed basis. In data streaming, data is requested from the storage system—e.g., a file or database system. Chunks of data are obtained sequentially until a request is fulfilled. Typically, each chunk of data in the sequence includes a specified number of bytes. Thus, conventional data streaming typically fetches equal-sized chunks of data in order until sufficient data has been obtained to fulfill the request.
Character-based data includes encoded data that is used to represent characters. For example, character-based data may be stored through mixed-byte encoding. Mixed-byte encoding utilizes a varying number of bytes to encode each character. However, other encoding schemes may be used. Such encoding schemes may vary the number of bytes that are used to encode a character, or may use a fixed number of bytes to encode a character. Character-based data can be converted into characters (e.g., text).
Data streaming may be desired for character-based data. FIG. 1 depicts a conventional method 10 for performing streaming of character-based data that may be encoded using an encoding having a variable number of bytes per character (such as in mixed-byte encoding). FIG. 2 depicts a conventional system 30 for performing streaming of character-based data. The system 30 includes an input stream reader 32, a client 34, and a storage system 40 used to store the data. Referring to FIGS. 1 and 2, a request for character-based data is provided to the input stream reader 32 from the client 34 (step 12). The request is from a user and is, therefore, typically for a fixed number of characters. Thus, for mixed-byte encoding, requests for the same number of characters may vary in length based upon the number of bytes used to represent the characters.
The input stream reader 32 fetches from the storage system 40 a sufficient amount of character-based data to satisfy the request (step 14). The input stream reader 32 converts the character-based data that has been fetched into characters (step 16). The number of characters sufficient to fulfill the request is provided to the client 34 (step 18). Thus, the fixed number of characters is output in step 18. Any remaining data is discarded (step 20).
Although the conventional method 10 and system 30 function, the method 10 and system 30 are inefficient. As discussed above, the request is for a fixed number of characters. However, for encoding schemes such as mixed-byte encoding, the same number of characters may correspond to differing numbers of bytes of character-based data. The exact amount of character-based data for the fixed number of characters in a particular request is unknown. As a result, a sufficient amount of data to satisfy any request, not just the request at hand, is fetched in step 14. Thus, a large amount of data (e.g., an entire document) is typically fetched in step 14. However, the request may be only for a small portion of the document. Consequently, a large amount of data may be unnecessarily fetched, converted, and then discarded.
Other conventional methods for performing character-based data streaming may function as conventional data streaming. In such conventional methods, a request is made and a fixed number of bytes is fetched and converted using the converter (input stream reader) 32. This process is repeated, fetching and converting sequential chunks of data, until the request is fulfilled. However, such a conventional method may not be capable of handling encoding schemes in which the number of bytes per character varies, e.g., mixed-byte encoding. This is because a chunk of the character-based data may not correspond to a whole number of characters.