1. Field
The present application relates generally to artificial intelligence systems and methods, and, more specifically, to pseudo-genetic meta-knowledge based artificial intelligence systems and methods.
2. Background
Artificial intelligence systems seek to provide human-like behavior to autonomous actors. An actor may be an entity programmed to interact in an environment. An actor can be anything as simple as a computer opponent in online poker (simulated environment), or a drone executing a complex series of military maneuvers (real environment). To produce actors, some systems include genetic algorithms. Genetic algorithms may include Boolean variables to express hypothetical actors. Each of these Boolean variables may be considered a “gene.” When two parents “mate” to form offspring, their children may inherit Boolean “genes” as determined by a global “Genetic Operator” function.
Some systems include genetic programming Genetic programming generally refers to algorithms which use mathematical functions to express a hypothesis. These mathematical functions may be created by ever-elongating chains of operators, static numbers, and variables—each of which is considered a “gene.” When parents “mate” to form offspring, their children may inherit “chains” of mathematical operations as determined by a global “Genetic Operator” function.
A genetic operator generally refers to a “mating” operation which mixes the genetic information from genetic algorithm or genetic programming parents into their offspring. This is a static, hard-coded function at the start of the simulation. Specific examples of “mating” operations include: single-point crossover, two-point crossover, uniform crossover, and point mutation.
The genetic algorithms and genetic programs hold physical attributes static while the artificial intelligence is permitted to evolve over time. Accordingly, all actors have the same physical characteristics. Thus, the actors generated through these methods are a reflection of behavior evolution which may be tied to a specific physical form.
Furthermore, the evolution process genetic algorithms and genetic programs evaluate actors based on a fixed learning parameter. For example, reward functions may be chosen at the time of programming, and are assigned a static weight function by the simulation designer. A reward function generally refers to an immediate, quantitatively measurable result obtained by an actor based on its environment. For example, an actor used for blackjack card game simulation could have its reward function consider the dollar amount won in every game. Reward function values in an infinite sample may also attenuate over time, but only as dictated by a single depreciation variable. The simulation designer may select weights for the reward functions and depreciation variables which appear to be desirable, but in practice do not generate the optimal results in a given environment. This can only be discovered through thousands of simulated experiments and results, and can be prohibitively time consuming.
In addition, many reward functions rely on the Markov Property (MP) or Markov Decision Process (MDP), which presumes that historical states or actions do not impact current or future decisions. This may limit their simulation's ability to recognize stagnation in macroscopic behavior patterns which can be exploited by observant humans.
In many systems, the reward function may determine which actors are selected for reproduction, and which are culled from the testing pool. The timing for reproduction is generally universal for all actors, that is, the simulation allocates a time for reproduction. While this can lead to rapid convergence to a set of successful behaviors, it can also lead to rapid homogenization of a population. This mass homogenization may create a universal weakness in the actors when a single exploit is found.
Accordingly, improved artificial intelligence systems and methods are desirable.