RGB-Skeleton Fusion (Liu et al.)

Multimodal network blending RGB features with pose trajectories for robust element spotting under occlusion.

Primary tasks
Element detection, Element classification
Modalities
RGB video, 2D pose keypoints
Architecture
Dual-stream CNN + temporal attention fusion
Frameworks
PyTorch
Availability
research
Maintainer
Shanghai Jiao Tong University
Released
Nov 2022

Highlights

Implementation Notes

This approach benefits from higher frame rates; the authors recommend 60 fps down-sampled to 30 fps with motion blur augmentation. When integrating with Keyframe-TSN, share the pose backbone to cut inference costs.