This invention relates to reliable communication of large amounts of related digital data, and more particularly to reliable point-to-multipoint communication of large files of data, where a file is a collection of digital data.
There is an inherent problem of transmitting a large file of digital data from a source node over a communication network to multiple destinations with confidence that it has been received accurately. A single source node embodiment is described for illustration purposes in this document, though it is allowed to use multiple source nodes to send a single source file. The file can be an operating system update, a high-definition video, or the data collected by environmental sensors. There is a need for an efficient transmission scheme for such tasks with the consideration of the following performance metrics.    1. Encoding/decoding complexity: The order of complexity is hoped to be linear with the number of total packets.    2. Network transmission delay: To allow the scheme to support time sensitive services, small transmission delay is necessary.    3. Transmission throughput: For the multicast communication model, the scheme supports the target multicast capacity.    4. Cost at the intermediate network nodes: Both the computation and storage requirements for the intermediate nodes (nodes that participate in the transmission but do not require the file) should be minimized.    5. Transmission protocol overhead: The protocol overhead should be minimal.
The communication network can be a wireline network, a wireless network, or an overlay network built upon the combination of wireline and wireless networks. It is a prerequisite that the network transmission is packetized, i.e., all transmitted data are grouped into suitably-sized blocks, called packets. A network node can transmit packets to its neighbor nodes through wireless or wireline links. However, it is known that the network links are not perfect; as a packet can be corrupted or erased during transmission, or an intermediate node may fail during transmission. A corrupted packet can be deleted and treated as an erasure. (In this model, we consider only erasures on the network links.)
Network coding is a known network transmission technique that generally improves the network throughput and is resilient to packet loss. In particular, linear network coding is a known efficient way to achieve the network capacity of multicast. Routing is a special case of linear network coding. Instead of just routing, linear network coding allows intermediate network nodes to transmit new packets generated by linear combinations of the packets they have received. Such operations in the intermediate network nodes are referred to as recoding of network codes. However, as the size of the transfer matrix determined by the linear combination coefficients of network increases, the complexity of encoding, recoding and decoding also increases quickly.
A number of file distribution schemes using network coding are known to the art. One common feature of known schemes is to use the concept of chunk (also called class or generation) wherein a data file to be transmitted is divided into equally-sized packets, and all the packets are grouped into several chunks of equal size, which can be disjoint or overlapped. During the transmission, network coding is applied to the packets in the same chunk. These schemes have low encoding/decoding complexity if the size of the chunks is small. But they all have various drawbacks and cannot meet the requirements of the performance metrics defined above.
One problem of “chunk”-based schemes is how to schedule the transmissions of these chunks. A simple solution is to transmit the chunks one by one, where the source node keeps transmitting the coded packets of one chunk. A destination node collects packets until it can correctly decode the current chunk. Then the destination node sends a control message to the source node to indicate successful decoding. In such a scheme, however, the control message overhead can be large, and in a peer-to-peer application, the message must be sent to all the concerned network nodes. Moreover, this scheme is not scalable to multiple destinations because the transmission of a chunk does not stop until all the destinations decode the current chunk.
To resolve the issues of sequential scheduling, chunks can be scheduled in a round-robin order or randomly. However, both methods introduce new issues. First, both methods are not efficient when there are only a small fraction of chunks undecoded. Second, the intermediate nodes are required to buffer all the chunks. Precoding was considered to resolve the first issue, but this method does not work well for practical chunk sizes.
Overlapped chunks can improve the throughput of random scheduling for practical chunk sizes. Intuitively, the advantage of overlapped chunks is to use the decoded chunks to help the decoding of the other chunks. Known designs of overlapped chunks are heuristic, and the performance can only be evaluated by simulation.
A class of codes that are promising but heretofore not fully exploited are fountain codes. Fountain codes are the class of codes for transmitting messages through erasure channels with low encoding/decoding complexity. An erasure channel is a simple network with only one link. These include LT codes, a class of codes introduced by M. Luby for erasure channels with low encoding/decoding complexity, and Raptor codes, that achieve even lower encoding/decoding complexity by combining a variation of LT codes with certain pre-codes.
The encoding of LT codes involves the following procedure: First, a pre-designed degree distribution is sampled and an integer value d is obtained. Then d distinct input packets are chosen randomly and added using bitwise sum to yield the output packet. Output packets are transmitted through an erasure channel and the number of transmitted output packets can be unlimited.
An LT decoder can use any n output packets to recover the original K input packets, where n is certain number larger than K. The decoding process can be described by a decoding graph (called a Tanner graph) of the LT codes. A decoding graph is a bipartite graph with K nodes on one side and n nodes on the other side, called the variable nodes and the check nodes, which correspond to the input packets and output packets, respectively. There is an edge between an input packet and an output packet if the input packet contributes to the value of the output packet. At each step of the decoding algorithm, the decoder identifies an output packet of degree one. The value of the output packet of degree one is just the value of its unique neighbor among the input packets. Once an input packet is recovered, its value is subtracted from the values of all the neighboring output packets, and this input packet and all its related edges are removed from the graph.
The degree distribution of an LT code needs to be carefully designed so that the LT code has both a low encoding/decoding complexity and the above decoding algorithm succeeds with high probability. The LT codes proposed by M. Luby require that all the input packets are recovered by the LT decoder. Raptor codes relax this condition in a way that only a constant fraction of the input packets need to be recovered. The remaining input packets are recovered using precoding.
Fountain codes, however, are not designed for general communication networks; simply applying fountain codes in such networks may not be optimal. For the general network, applying fountain codes link by link together with network coding can achieve the maximum throughput. But such a scheme has two drawbacks: First, both decoding and recoding of LT codes are needed at each intermediate node, so the complexity is not low. Second, a decoding delay in proportion with the file size is incurred at each intermediate node, and so the end-to-end decoding delay grows with size of the network. For a network with a tree structure, the delay can be reduced by applying fountain codes in a stack manner: An intermediate node buffers the packets it receives and re-encodes using fountain codes, and a destination node decodes multiple layers of fountain codes. This method only moves all the decoding to the destination nodes.
The difficulty to apply fountain codes in networks employing linear network coding is because network coding changes the degrees of packets and results in the failure of efficient belief propagation decoding. Heuristic algorithms have been developed for special communication scenarios such that the set of coded packets received at a destination node approximates an LT code. But in general it is difficult to guarantee the degree of the received packets following a specific distribution using distributed encoding. Therefore, what is desired is a file transmission scheme that does not require excessive computation and storage at the intermediate nodes, and that can be used to efficiently distribute a file in a network employing linear network coding.
The following patents have been noted in the art: U.S. Pat. Nos. 7,068,729; 6,373,406; 6,307,487.
The following references provide background information for the present invention:    S.-Y. R. Li, R. W. Yeung, and N. Cai, “Linear network coding,” IEEE Trans. Inform. Theory, Vol. 49, No. 2, pp. 371-381, February 2003.    P. Maymounkov, N. J. A. Harvey, and D. S. Lun, “Methods for efficient network coding,” in Proc. Allerton Conf Comm., Control, and Computing, September 2006.    T. Ho, B. Leong, M. Medard, R. Koetter, Y. Chang, and M. Effros, “The benefits of coding over routing in a randomized setting,” in Proc. IEEE ISIT '03, June 2003.    D. S. Lun, M. Medard, R. Koetter, and M. Effros, “On coding for reliable communication over packet networks,” Physical Communication, vol. 1, no. 1, pp. 3-20, 2008.    D. Silva, W. Zeng, and F. R. Kschischang, “Sparse network coding with overlapping classes,” CoRR, vol. abs/0905.2796, 2009.    A. Heidarzadeh and A. H. Banihashemi, “Overlapped chunked network coding,” CoRR, vol. abs/0908.3234, 2009.    R. Ahlswede, N. Cai, S.-Y. R. Li, and R. W. Yeung, “Network information flow,” IEEE Trans. Inform. Theory, vol. 46, No. 4, pp. 1204-1216, July 2000.    R. Koetter and M. Medard, “An algebraic approach to network coding,” IEEE/ACM Trans. Networking, vol. 11, No. 5, pp. 782-795, October 2003.    P. A. Chou, Y. Wu, and K. Jain, “Practical network coding,” in Proc. Allerton Conf. Comm, Control, and Computing, October 2003.    M. Luby, “LT Codes,” in Proc. 43rd Ann. IEEE Symp. on Foundations of Computer Science, November 2002, pp. 271-282.    A. Shokrollahi, “Raptor Codes,” IEEE Trans. Inform. Theory, Vol. 52, No. 6, pp. 2551-2567, 2006.    R. Gummadi, and R. S. Sreenivas, “Relaying a fountain code across multiple nodes,” in Proc. IEEE Information Theory Workshop, 2008, ITW'08, 2008, pp. 149-153.    M.-L. Champel, K. Huguenin, A.-M. Kermarrec, and N. L. Scouarnec, “LT network codes,” Techreport, INRIA, 2009.