Highlights
- Learns complementary saliency maps from RGB textures and skeleton motion to reduce false positives on step sequences.
- Temporal attention fusion emphasises take-off and landing frames without sacrificing spin recall.
- Provides an uncertainty head surfaced as GOE confidence intervals.
Implementation Notes
This approach benefits from higher frame rates; the authors recommend 60 fps down-sampled to 30 fps with motion blur augmentation. When integrating with Keyframe-TSN, share the pose backbone to cut inference costs.