Traditional task and motion planning (TAMP) systems for robot manipulation use cases operate on static models that often fail in new environments. Integrating perception with manipulation is a solution to this challenge, enabling robots to update plans mid-execution and adapt to dynamic scenarios.

In this edition of the NVIDIA Robotics Research and Development Digest (R²D²), we explore the use of perception-based TAMP and GPU-accelerated TAMP for long-horizon manipulation. We’ll also learn about a framework for improving robot manipulation skills. And we’ll show how vision and language can be used to translate pixels into subgoals, affordances, and differentiable constraints.

Subgoals are smaller intermediate objectives that guide the robot step-by-step toward the final goal.

Affordances describe the actions that an object or environment allows a robot to perform, based on its properties and context. For instance, a handle affords “grasping,” a button affords “pressing,” and a cup affords “pouring.”

Differentiable constraints in robot-motion planning ensure that the robot’s movements satisfy physical limits (like joint angles, collision avoidance, or end-effector positions) while still being adjustable via learning. Because they’re differentiable, GPUs can compute and refine them efficiently during training or real-time planning.