Processing data over a network occurs in many contexts including data reconstruction, disaster recovery, storage, encryption, encoding, content serving, and others. A system's processor configuration may affect its data processing efficiency. Likewise, a processor's data communication techniques may also affect data processing efficiency. These effects may be particularly noticeable when reconstructing erroneous or lost data from a failed disk or storage system. For example, processor configuration may affect the throughput and latency characteristics associated with conventional communication techniques for processing data in networks.
When digital data is transmitted or stored, errors (when a data element is corrupted) and erasures (when a data element is missing or known to be faulty) may occur in the data stream. Erasure codes are used in many applications to efficiently protect and reconstruct data when stored or transmitted. Reed-Solomon based erasure codes have been used for many years because they are computationally convenient. Existing solutions are efficient when the number of drives is small. For larger drive systems, the latency of the system tends to be high, which is a problem when disk access is desired during the process of reconstruction.
FIG. 1A illustrates a conventional tree-based processor configuration (note that simple centralized star configurations are more common and slower). Generally, each processor may be represented by a node, and may be arranged in a network or pattern using a conventional topology representation. Here, a simple tree made up of processors, or nodes, 30-36 illustrates a nodal pattern for processing blocks n, n−1, n−2, . . . 1, and 0. Processors or nodes 30-36 perform data communication and processing functions on data blocks at a given time index in a serial fashion. Data blocks may be packets, frames, segments, or other data encapsulation formats having one or more associated values. Each block may also represent a portion of a data stream. In some embodiments, nodes 30-36 represent a “bucket brigade” processor or processing system. Each node receives a value associated with a data block and performs an action (e.g., computing a function) before sending an updated value or data block to the next node in the chain. A bucket brigade generally has good throughput, but a large latency, commensurate with the number of nodes.
FIG. 1B illustrates a conventional extended tree-based processor configuration. In some embodiments, the simple tree-based configuration of FIG. 1A is extended to a system of pairs of nodes (four pairs in the depicted instance) where each pair represents a parent and child node. Each node shown could be a node (an intermediate point in the tree) or a leaf (an endpoint) in the tree, and a root may be represented by a destination node, also known as a sink (not shown). Three sequential time indices, or steps, t=0, 1, and 2, during the processing of a data block, or value, are shown. At t=0, a data block is sent from odd nodes 40, 44, 48, and 52 (1, 3, 5, and 7) to even nodes 42, 46, 50, and 54 (2, 4, 6, and 8). Upon receipt by the even nodes, the value of the data blocks may be included in a computation before the next time index or step.
At t=1, the data blocks are sent from nodes 42 and 50 to nodes 46 and 54, respectively. Again, a computation may be performed on the value of the data block, thereby changing the value of the data block, at the receiving nodes. At t=2, a data block is sent from node 46 to node 54, where a final data block value is accumulated. At the next time index or step, the data block having the final accumulated result is then sent to a destination node (not shown). In general, tree depth logarithmic in the number of nodes is required.
However, this configuration technique is inefficient because each node performs at most one task at a time (e.g., sending, receiving, storing/accumulating, performing a computation, or others), and many nodes have no tasks during most of the steps. Further, low throughput is achieved (long periods of time are required to begin processing of the next elements of a data stream, because a few nodes are repeatedly busy). Still, in this configuration, latency is also low, a computation completes quickly once started.
As a result, conventional nodal patterns configured using conventional techniques suffer from processing delays and/or latencies, slowing tasks such as responding to requests for data, encoding, encryption, data reconstruction, catastrophe recovery, and the like. Further, conventional configuration techniques often require the implementation of expensive and complex hardware and software to compensate for increased latency.
In view of the foregoing, there is a need for systems and methods that overcome such deficiencies.