Information about a domain is generally communicated and processed as sequences of items (e.g., bits, characters, words, frames of video). Information is present in both the individual items themselves and the order of the items (“TIP” has different meaning than “PIT”).
Many sequence processing applications involve distinguishing sequences constructed from relatively small sets of items (e.g., “alphabets,” “lexicons”). For example, all English spoken words can be represented as strings (sequences) over about 40 phonemes, all English text words, as strings over 26 letters, and all English sentences, and thus all English texts of any length (and thus all describable concepts), as strings over ˜100,000 words. In such cases, individual items or sequences of items may typically occur numerous times across the domain, (e.g., “CHAT,” “TACK,” “KITE,” “TRACK”).
One principle for communicating and processing sequential information of this type faster and more efficiently is by finding item sub-sequences, i.e., item sequences that occur within longer sequences, which occur frequently in the domain and assigning a single code to represent them. This is an instance of the technique of “information compression.” Borrowing a term from cognitive psychology (1), a single code (in a coding space) that represents a sequence of items (in an item space) may be referred to as a “chunk code” or simply “chunk,” and the process of assigning a “chunk” to represent an item sequence and physically associating that chunk with the individual items comprising that sequence may be referred to as “chunking.” In discussing or analyzing any particular instance/example of chunking, the sequence of items to which a chunk will be assigned/associated may be called the “items to be chunked.”
Chunking is a process of information compression. In general, any information processing system which uses chunking will also need to manipulate the individual items represented by a chunk. The inverse process of obtaining the individual items from a chunk is called “unchunking.” These correspond to the terms “packing” (“compressing”) and “unpacking” (“uncompressing”) used in the fields of communications and information processing.
It is essential to understand that the process of chunking is not simply a process for remembering item order. The class of machines which can simply remember item order is a superset of the class of machines that do chunking. A simple finite state machine (FSA) (FIG. 1A), or hidden Markov model (HMM) (FIG. 1B), or recurrent neural network (2, 3) (FIG. 1C), which changes state as a function of which item is currently being input and what its current state is, remembers item order, because one can look at the current state and know the history (order), at least probabilistically, of inputs that led to it. But these machines, in and of themselves, do not create chunks, physically associate such chunks with the comprising items, or perform the operations of chunking and unchunking. The invention described herein below, called “overcoding-and-paring” (OP), is specifically a method and embodiment for chunking (packing, compressing) and unchunking (unpacking, uncompressing) information.
The chunking process (or “chunk assignment process”) can be decomposed into at least three component processes.
First, selecting, choosing, or assigning a particular code from the “chunk code space.” The chunk code space is the set of all possible settings of the units comprising the “chunk coding layer.” This component process may be referred to as the “chunk code selection process,” or “chunk choosing process,” or similar phraseologies.
Second, activating the selected chunk code. In a digital embodiment, such as in a computer, a chunk code is active if the set of memory locations representing the units that comprise it are all in the active state.
Third, physically associating or connecting the selected chunk code with the items to be chunked, which again, are codes or activity patterns of the input layer. This component process may be referred to as the “chunk code association process,” the “chunk-item association process,” or some similar phraseologies. In order to associate a chunk code with an item, both must be in the active state in the physical memory.
A chunking process must satisfy the following two opposing constraints:
Chunking constraint 1: A chunking process assigns unique chunk codes to unique item sequences, e.g., a chunk code depends on the particular items comprising the sequence and on their order. For example, different chunk codes would be assigned to “PRE” and “PRO.” For this to be the case, a chunking process must know the full sequence of items to be chunked at the time the chunk code is to be assigned. Therefore, the assignment may not occur until all the items have been presented. This would seem to imply that the assigned chunk code cannot be activated until all the items have been presented.
Chunking constraint 2: to accomplish the third component process described above, physically associating the selected chunk code with the item codes, each of those item codes must be active while the chunk code is active. This implies that the chunk code may have to be activated on the first item code of the sequence and remain active while all the remaining items of the sequence are presented. But, it does not imply that, once an item is presented (and thus its code activated), it must remain active for the remainder of the duration that the chunk code is active.
Existing embodiments of the chunking process satisfy these two constraints by maintaining (keeping “active”) the codes of the individual items to be chunked until a chunk code unique to that sequence of items is selected, and activated, and associated with those item codes (4-12).
This technique may be described as temporarily “buffering” the items to be chunked in memory. The set of physical memory locations where those items are stored is referred to as the “buffer.” The number, M, of locations comprising the buffer is the “size” of the buffer and it determines the upper limit on the length of sequence that can be assigned a unique chunk code, which is M.
To the inventor's best knowledge, all prior descriptions of chunking assume that the buffer represents items using a localist representation (LR) scheme: in particular, each buffer location (“slot”) may hold the code of only one item at any given moment and the physical embodiment of each slot is disjoint from the embodiment of all the other slots of the buffer (FIG. 2A). Such embodiments may also be viewed as having M copies of the input layer, one for each of the M slots of the buffer. This is the case for embodiments similar to Time Delay Neural Networks (TDNNs) (13) (FIG. 2B) and for embodiments employing the “tapped delay line” concept (14, 15) (FIG. 2C). In contrast, the invention (OP) requires only a single instance of the input layer and this is primarily a consequence of the fact that OP represents items using “sparse distributed representations” (SDR) of chunk codes as well as of items.