Humans have evolved to be able to link intentions to emotions or physical actions. This is why we may approach someone with open arms but flee if someone approaches using threatening body language. While AI can accurately identify emotions, it struggles to derive intention from them. A research group, with the help of performers from Japan and Taiwan, has helped facilitate a means to bridge this gap.
Imagine a figure approaching in the distance. Before seeing their face or hearing their voice, you must instantly decide: friend or threat? While humans effortlessly read subtle body language to make this survival instinct, artificial intelligence (AI) continues to struggle. Historically, AI has focused on recognizing basic emotions (like happiness) or physical actions (like walking), ignoring social intention—the social signals directed at others. For a service robot or AI agent, knowing whether a person poses a threat is far more important than simply identifying their emotion.
Now, researchers have established a new benchmark for "embodied social intention," uncovering how we signal threats and revealing a critical "alignment gap" between human cognition and AI. They are presenting their work this week at the 20th IEEE International Conference on Automatic Face and Gesture Recognition (FG2026), held in Kyoto. The paper is titled "Friend or Foe? Benchmarking Human Perception and ST-GCN Decoding of Embodied Social Intention."













