1. Technical Field
The present invention relates generally to digital data processing systems, and, in particular, to structures and methods in digital data processing systems for maintaining ordered linked lists.
2. Background Art
In general, in the descriptions that follow, we will italicize the first occurrence of each special term of art which should be familiar to those skilled in the art of digital data processing systems. In addition, when we first introduce a term that we believe to be new or that we will use in a context that we believe to be new, we will bold the term and provide the definition that we intend to apply to that term. Since our invention is specifically intended for use in digital data processing systems, we will often use terms that are well known to those skilled in this particular art. For example, with respect to an individual element of data stored at a particular address in a memory component of such a system, we will typically use the term pointer to refer, not to the element per se, but to a separate and distinct element that contains the address of the referenced element. For convenience of reference, we will use the term element hereafter to refer to both discrete data and more complex objects, records or the like which may be viewed as single logical entities.
From at least the 1940's, programmers of digital data processing systems have employed various logical structures to store, retrieve and maintain sets of elements. In one popular structure, the linked list, each member of the list is comprised, at a minimum, of two components: (1) the actual element itself (what we will refer to as the load), and (2) a forward link containing a pointer to the immediately succeeding member on the list (we call this member the forward member). Thus, for example, in a linked list containing three members, the first member (often called the head of the list) contains a forward link that points to the second member of the list; the second member contains a forward link that points to the third member of the list; and the third and last member of the list (often called the tail of the list) contains a null forward link, indicating that there are no other members of the list. Such a list is referred to as singly-linked, since an existing member of the list can be found only by searching or walking the list, starting at its head and proceeding towards its tail, until the desired member is found. If desired, each member can be expanded to include a third component: a backward link to the immediately preceding member of the list (we call this member the backward member). Such a list, commonly referred to as doubly-linked, can be walked from either direction as appropriate. In general, singly-linked lists are more memory efficient, while doubly-linked lists, in addition to being bi-directionally searchable, are less vulnerable to loss of continuity due to inadvertent damage to one of the link pointers. A doubly linked list has the additional advantage in that, in a system having a mechanism for selecting members that is independent of the list walking mechanism (e.g., a global search engine or a relational cross-referencing mechanism), the selected member's forward and backward link pointers can be used to remove the member from the list without invoking the list walking mechanism to identify the backward member (which is not visible to a member of a singly linked list).
In an ordered linked list, the sequential position of each member of the list is related to a selected characteristic of that member. For example, members can be ordered temporally (e.g., by some relevant time relationship), spatially (e.g., by some relevant physical relationship), or by context (e.g., by some relevant logical relationship). In such a list, the location or position at which each new member is to be added or inserted is a function of the ordering relationship. Once a singly-linked list has been walked to find the appropriate point of insertion, the insertion operation requires two steps: (1) the forward link of the backward member must be copied to the forward link of the new member; and (2) the forward link of the backward member must be updated to point to the new member. In a doubly-linked list, the insertion operation requires two additional steps: (1) the backward link of the forward member must be copied to the backward link of the new member; and (2) the backward link of the forward member must be updated to point to the new member.
For the purpose of this disclosure, let us define every ordered linked list as consisting of at least one section. By definition, a section is comprised of an ordered series of members representing a continuous sequence; a missing member in the sequence inherently breaks the list into two sections. Thus, a complete list consists of a single section, whereas an incomplete list consists of more than one section, each separated from the adjacent section(s) by a gap. One primary objective of our invention is to provide an improved method for more efficiently managing the reassembly of segments into sections, and the merging of sections into complete ordered linked lists.
In the discussion to follow, we shall refer to linked lists as being either weakly-ordered or strongly-ordered. When we refer to a list as being weakly-ordered list, we mean that the ordering relationship between members is a function of each member's load with respect to purely extrinsic criteria. Thus, for example, members may be ordered numerically based upon a particular numeric field within the load, but there is no expectation that the list will (or should) be continuous. Assume, by way of example, that in a linked list of a company's employees, the ordering relationship is a function of the load field containing the employee's social security number. Since the list clearly cannot contain all possible social security numbers, the list, even if so ordered, is only weakly so. In contrast, when we refer to a list as being strongly-ordered, we mean that the ordering relationship between members is a function of each member's load with respect to intrinsic criteria. Thus, for example, members may be ordered contextually based upon a particular text field within the load. Assume for this example that the linked list consists of short text segments, received over time (but not necessarily in proper order), of a considerably larger textual message—to be comprehended, not only must all segments be present, each must be in its proper contextual relationship with respect to all other segments. Thus, this list, as so ordered, is strongly so.
In general, the primary access point of a linked list is a header block which contains, at a minimum, a forward link containing the pointer to the member at the head of the list. In a doubly-linked list (and sometimes for convenience in a singly-linked list), the header block will also include a backward link containing the pointer to the member at the tail of the list. For convenience, the header block may contain other information related to the status of the list, such as the number of members currently on the list.
Transmission Control Protocol (“TCP”) is a method used in combination with the Internet Protocol (“IP”) to send data in the form of message units, called packets, between computers over the Internet. TCP is known as a connection-oriented protocol, which means that a connection is established and maintained until such time as the message(s) to be exchanged by the application programs at each end of the connection have been exchanged. While IP handles the actual delivery of the data, TCP keeps track of the individual packets into which a message is divided for efficient routing through the Internet. From a system perspective, TCP is responsible for ensuring, at the transmitting end of the connection, that a message is divided into packets that can be transmitted using IP, and, at the receiving end of the connection, for reassembling the packets received via IP back into the complete message. For example, when application data, such as a Web page, is transmitted from a content server, the TCP program layer (what we prefer to call the TCP transmitter) in that server converts the application data, in this case an HTML file, into a serial byte stream, sequentially numbers each byte, and then forwards segments of the now-numbered byte stream to the resident IP program layer (what we prefer to call the IP transmitter). In general, each segment includes sufficient byte sequencing and length information to enable reassembly of the respective piece of the byte stream into the original application data.
The IP transmitter encapsulates each segment into a respective IP packet for transmission via the Internet. Although each packet has the same destination IP address, it may get routed differently through the Internet, and, occasionally, may never arrive at the intended destination. At the receiving client server, the resident IP program layer (what we prefer to call the IP receiver) extracts the encapsulated segment and passes it to the resident TCP program layer (what we prefer to call the TCP receiver) for reassembly into the original byte stream. When an arriving segment contains bytes that are out of sequence with respect to the original byte stream, the TCP receiver will wait until all intervening bytes in the sequence have arrived before forwarding them to the application program. Thus, the application program is assured of receiving the application date in the original order, although not necessarily at a smooth or consistent rate of delivery.
The objective of TCP is to provide a reliable, connection-oriented delivery service. TCP views data as a stream of bytes, with each contiguous group of bytes being transferred as a separate and distinct segment; the exact number of bytes per segment is indicated in a respective field of the IP packet header. Data damage detection is handled by adding a checksum to each header. To provide the connection-oriented service, TCP takes care to ensure reliability, flow control, and connection maintenance. TCP is quite robust, being capable of recovering from data damage, loss, duplication, or out-of-sequence delivery. In order to do this, the TCP transmitter assigns a sequence number to each byte in each segment to be transmitted. For each segment received, the TCP receiver must return within a specified period an Acknowledge (“ACK”) which includes the sequence number of the next expected byte. Under certain conditions, this same ACK may be retransmitted by the TCP receiver (thus becoming a so-called “duplicate ACK”). For example, if a segment is detected as damaged by the TCP receiver, it will discard the segment and return the duplicate ACK. Similarly, if a segment is detected as having been received out of sequence, the TCP receiver will send the duplicate ACK. In both cases, upon receiving the duplicate ACK, the TCP transmitter will automatically resend the segment containing the byte having the indicated sequence number.
In a typical TCP receiver, a reassembly process reassembles a multi-segment message using a linked list that is a strongly ordered as a function of the sequence numbers assigned by the TCP transmitter. When out-of-order segments are received, the reassembly process first validates and then inserts each validated segment into the list at the proper position. The reassembly process will deliver a segment only after having determined that the segment is valid and the byte sequence contained therein is in order with respect to earlier-delivered segments.
By way of example, we have illustrated in FIG. 1 a typical instantiation of the TCP reassembly process as practiced on a digital data processing system incorporating a conventional, commercially available microprocessor, such as the Pentium® 4 from Intel Corporation. As of the instant illustrated, the TCP reassembly process has received, validated and linked a total of 998 segments onto the TCP segment list. We will assume for the purposes of this example that each segment has a load of 100 bytes. Thus, as of the illustrated instant, the segment list consists of 3 sections: a first section consisting of only segment S1 (containing bytes 1 through 100); a second section consisting of segments S3 (containing bytes 201 through 300) through S996 (containing bytes 99501 through 99600); and a third section consisting of segments S998 (containing bytes 99701 through 99800) through S1000 (containing bytes 99901 through 100000). In this not-unusual example, segment S2 (containing bytes 101 through 200) has either been rejected (e.g., because it failed the checksum validation test) or it simply failed to arrive (e.g., it got lost somewhere in the Internet), and the TCP reassembly process is awaiting retransmission. Segment S997 (containing bytes 99601 through 99700), on the other hand, has just arrived and been validated, and is awaiting insertion into the TCP segment list.
To accomplish insertion of segment S997, the TCP reassembly process must first access the TCP control block to retrieve the forward link to the first segment on the TCP segment list, namely segment S1. Since the ending sequence number of this first segment plus 1 (i.e., “101”) is not equal to the starting sequence number of the new segment (i.e., “99601”), the TCP reassembly process will walk to the next segment on the list, namely segment S3. Since the ending sequence number of this second segment plus 1 (i.e., “301”) is still not equal to the starting sequence number of the new segment (i.e., “99601”), the TCP reassembly process will continue the walk to the next segment on the list, namely segment S4. The TCP reassembly process will continue walking the list in this manner until segment S996 is reached. Since the ending sequence number of this segment plus 1 (i.e., “99601”) is equal to the starting sequence number of the new segment (i.e., “99601”), the TCP reassembly process will terminate the walk, and insert the new segment between segments S996 and S998 using the singly-linked list insertion operation described above. Accordingly, as shown in FIG. 2, the TCP segment list, after the insertion operation has been performed, will consist of only two sections: the first section still consisting of only segment S1 (containing bytes 1 through 100); and the second section which now consists of segments S3 through S1000 (containing, in total, bytes 201 through 100000). As you can see, the TCP reassembly process had to access a total of 995 list members before finding the correct insertion point.
Although this example may appear to be a worst case scenario, it is, in fact, not that unusual. Given that many messages are quite long, comprising hundreds or, perhaps, thousands of segments and that the Internet is getting more and more congested, the reassembly process can be a very compute intensive operation and current implementations tend to be too inefficient for high-speed networks. With the recent introduction of multi-giga-bit-per-second Ethernet communication networks, the potential rate of delivery is so high that even an occasional loss/damage of a packet may exceed the capabilities of the client servers to manage the rapidly-accumulating out-of-order segments while awaiting retransmission of the lost/damaged segment. We submit that what is needed is a more efficient method for maintaining ordered linked lists, particularly for use in such applications as the TCP reassembly process.
In the drawings, similar elements will be similarly numbered whenever possible. However, this practice is simply for convenience of reference and to avoid unnecessary proliferation of numbers, and is not intended to imply or suggest that our invention requires identity in either function or structure in the several embodiments.