Transcoding is the direct conversion of digital data from one encoding to another, such as for movie data files or audio files. Transcoding is performed for many reasons, including cases where a target device (or workflow) does not support the format or has limited storage capacity or bandwidth that mandates a reduced file size, or to convert incompatible or obsolete data to a better-supported or modern format. For example, Apple® ProRes is widely used as a common format for digital video, but the data size of a two-hour movie in those formats can be substantial. The large size can increase the cost and difficulty of handling movie files. Transcoding these types of files into, e.g., a MPEG-4 format can compress them to less than 10% of their size.
Transcoding is commonly a lossy process, introducing generation loss. The process of lossy-to-lossy transcoding introduces varying degrees of generation loss. In other cases, the transcoding of lossy to lossless or uncompressed formats is technically a lossless conversion because no information is lost, however the process is irreversible and is more suitably known as “destructive.”
Video transcoding can be a slow process, taking many minutes or hours, even with the fastest available hardware. File-based transcoding is usually entirely asynchronous—the transcoded file cannot be used until the process is complete.
Traditionally, when video files are transcoded: (1) transcoding does not start until the entire file is available in the transcoding system for transcoding, (2) the resultant transcoded file cannot be played until transcoding is complete, or (3) both (1) and (2).
Under current practice, it is difficult to process a file while it is still being written. Existing approaches can only read in (and process/output) a file in its current, incomplete state, regardless of whether it is still being written. If an existing technology is able to read in (and process/output) a file as it is written, there is no available knowledge of when the file has been completely written to non-transient storage.
Existing programs assume a file (i.e., not a pipe or stream) is complete or whole and not growing on disk, so the programs typically fail if they try to read a file that is still being written. This cannot be solved using a chain of pipe commands, or “tee” commands with pipes. If any of the processes in the pipe-chain fail, the entire set of processes stops. Pipe-based commands do not allow for retrying from the beginning of the file. Furthermore, some files cannot be processed as a stream—some level of random access is needed. Further, some existing programs are unable to read from pipes. When using a named pipe, if a reader stops reading, the buffer fills and tee can no longer write the file, so processing halts indefinitely.
Other existing approaches, such as using “cat” and “tail” on a file, also have limitations. Cat will only read the entire file as it exists at that point in time—it will not wait for the file to finish being written. Tail (with option “-c 0” to start reading from the beginning of the file, and option “-f” to continue following the end of the file as it is being written) will wait indefinitely for further data until it is stopped by an external process.