Highlights
- Normalises element naming across ISU rule updates, enabling longitudinal performance comparisons.
- Includes multi-annotator agreement metrics and consensus labels to highlight contentious calls.
- Bundles scoring calculators and validation scripts for ensuring panel logic consistency.
Notes for Practitioners
FSBench emphasises reproducibility—when publishing results, cite the schema version and hash of the splits used. The evaluation toolkit expects deterministic floating-point behaviour; enable double precision when integrating custom metrics. Access is prioritised for groups contributing new annotations or validator scripts.