Statistical objects are used where conventional secured communications of an original object cannot be used due to protocol constraints or communications bandwidth limitations. Using a statistical object instead of an original object achieves much greater bandwidth efficiency due to the use of a deterministic statistical representation of the original object.
In this Specification, and in the Claims that follow, the term “statistical object” is a string of values mapped by a random or “hash” function. The output of this function points to a string of values which stands for or represents the input to the function. In FIG. 1, the output string of values is shown as x1, x2, x3 . . . xn. In this example, the input of the function is an original object, while the output is a statistical object.
If the string is shorter than the input, a “collision” occurs. A collision results when two or more different inputs produce the same output, and is generally considered detrimental.
To mitigate the effects of collisions, additional inputs are added to the function. These inputs vary over time, enabling different streams of statistical object to eventually diverge from their colliding tendencies. As an example, a clock may be employed to add a time value as an input. As an alternative, a counter may serve as an input. Multiple additional inputs may be used together in the generation of the statistical object.
In this Specification, and in the Claims that follow, the terms “function,” “random function” and “hash function” are intended to encompass any procedure or mathematical method that converts a large amount of data into a smaller amount of data. In one embodiment of the invention, the output may be a single integer or value which serves as an index to an array or database. According to Wikipedia, the output values of a hash function may be called hash values, hash codes, hash sums, checksums or hashes. The inputs to a hash function may be referred to as keys.
FIG. 2 illustrates a simple example of the operation of a hash function. The hash function (x) associates a hash or output with each name. The input of Joe causes the hash function to point to output “03”. A set of four inputs or keys are shown as the names Joe, Moe, Sam and Charlie. The input of Moe causes the hash function to point to the output of “01”. The input of Sam causes the hash function to point to the output of “06”. The input of Charlie causes the hash function to point to the output of “03”. Because the Joe and Charlie inputs both return a hash of “03”, this pair of hashes is said to cause a collision.
As noted above, this collision may be avoided by adding a clock or a counter as an additional input to the hash function. So, in an alternative embodiment, if the input of Joe is provided to the hash function at 1:00 p.m., and the input of Charlie is provided to the hash function at 2:00 p.m., the different inputs would lessen the probability that this collision would occur.
One advantage of using statistical objects is that information may be “concentrated” in a relatively smaller number of transmitted bits, which increases the efficiency of communication across a network.
The consequence of using a deterministic statistical representation is that the representation is not guaranteed to identify uniquely the source original object. The deterministic statistical representation, the statistical object, may be generally considered to be the output of a hash or similar function of the original object along with one or more varying deterministic inputs such as a clock or counter. These varying deterministic inputs are necessary so that the cumulative stream of output statistical objects generated from a single original object is unique across a large number of generated statistical objects. Unambiguously identifying a statistical object to a unique original object is essentially an exercise in mitigating the effects of the birthday problem. The birthday problem is the probability that output of the hash of different original objects and their respective deterministic inputs produce identical statistical objects. The generation of a single statistical object by two or more original objects causes a collision.
FIG. 3 supplies a graph that illustrates the birthday problem. The number of individuals in any given group is shown on the x-axis. The y-axis shows an approximate probability, on a scale from zero to one, that two people in the group will share the same birthday. As an example, in a group of twenty-three people, the probability that two persons in this group will have the same birthday is about fifty percent.
The birthday problem may be understood as an example of the hash function depicted in FIG. 2. In the birthday problem, the keys or inputs are the names of the individuals in the group. The hash function maps these inputs to one of the hashes or outputs, which represent the days of the year. If two persons in the group share the same birthday, the hash function points to the same day for two different individuals, and a collision occurs.
Given a uniform distribution, the probability of a collision increases with the number of statistical objects in use. A mechanism to unambiguously identify statistical objects back to their original objects would constitute a major technological advance, and would satisfy long felt needs and aspirations in the cyber security industry.