The invention generally relates to robots, and more specifically to interpreting human-robot instructions.
In general, humans can ground natural language commands to tasks at both abstract and fine-grained levels of specificity. For instance, a human forklift operator can be instructed to perform a high-level action, like “grab a pallet” or a low-level action like “tilt back a little bit.” While robots are also capable of grounding language commands to tasks, previous methods implicitly assume that all commands and tasks reside at a single, fixed level of abstraction. Additionally, those approaches that do not use abstraction experience inefficient planning and execution times due to the large, intractable state-action spaces, which closely resemble real world complexity.
Existing approaches generally map between natural language commands and a formal representation at some fixed level of abstraction. While effective at directing robots to complete predefined tasks, mapping to fixed sequences of robot actions is unreliable when faced with a changing or stochastic environment.