Geometric Fabrics: a Safe Guiding Medium for Policy Learning

Fabrics-Guided Policies (FGP) are trained via large-scale reinforcement learning in simulation over a geometric fabric layer. The geometric fabric imbues certain behaviors like establishing fingertip contact, finger curling, and upholds hardware constraints like joint position, acceleration, and jerk limits. These behaviors guide RL and simplify reward engineering. The framework is applied to in-hand cube reorientation by a highly actuated hand and achieves breakthrough policy performance in the real world.

Abstract

Robotics policies are always subjected to complex, second order dynamics that entangle their actions with resulting states. In reinforcement learning (RL) contexts, policies have the burden of deciphering these complicated interactions over massive amounts of experience and complex reward functions to learn how to accomplish tasks. Moreover, policies typically issue actions directly to controllers like Operational Space Control (OSC) or joint PD control, which induces straightline motion towards these action targets in task or joint space. However, straightline motion in these spaces for the most part do not capture the rich, nonlinear behavior our robots need to exhibit, shifting the burden of discovering these behaviors more completely to the agent. Unlike these simpler controllers, geometric fabrics capture a much richer and desirable set of behaviors via artificial, second order dynamics grounded in nonlinear geometry. These artificial dynamics shift the uncontrolled dynamics of a robot via an appropriate control law to form \textit{behavioral dynamics}. Behavioral dynamics unlock a new action space and safe, guiding behavior over which RL policies are trained. Behavioral dynamics enable bang-bang-like RL policy actions that are still safe for real robots, simplify reward engineering, and help sequence real-world, high-performance policies. We describe the framework more generally and create a specific instantiation for the problem of dexterous, in-hand reorientation of a cube by a highly actuated robot hand.

Highlight Video

Video is 1x real-time.

Presentation Video

Video is 1x real-time.

FGP β = 40 (best setting)

Video is 1x real-time.

FGP β = 40, Robustness to Disturbances

Video is 1x real-time.

FGP β = 2.5

Video is 1x real-time.

FGP β = 10

Video is 1x real-time.

FGP β = 20

Video is 1x real-time.

FGP β = 30

Video is 1x real-time.

FGP β = 40, 186 CS Run

Video is 5x real-time.

FGP β = 50

Video is 1x real-time.

DeXtreme (new)

Video is 1x real-time.

DeXtreme (new) 670 CS Run

Video is 5x real-time.