RGB-Skeleton Fusion (Liu et al.)

Highlights

Learns complementary saliency maps from RGB textures and skeleton motion to reduce false positives on step sequences.
Temporal attention fusion emphasises take-off and landing frames without sacrificing spin recall.
Provides an uncertainty head surfaced as GOE confidence intervals.

Implementation Notes

This approach benefits from higher frame rates; the authors recommend 60 fps down-sampled to 30 fps with motion blur augmentation. When integrating with Keyframe-TSN, share the pose backbone to cut inference costs.

Highlights

Implementation Notes

Useful datasets

FiSV Dataset

FSD-10

SkatingVerse Challenge