Automated Video Assistant Referee in Lead Climbing
DOI:
https://doi.org/10.36950/2025.2ciss066Keywords:
markerless tracking, pose estimation, sports technologyAbstract
Introduction In lead climbing, points are awarded for each hold held, which are then primarily decisive for the ranking. These points are awarded immediately after an attempt by a judge who observed the athlete. Competition organisers are required to make video recordings, but these are only viewed by the head referee in case of an appeal.
Our work aims to enhance the objectivity of judging. Our approach is based on applying pose estimators to video footage that is recorded anyway, and a subsequent analysis that determines metrics for judging. We want to tune our approach based on the scores given by judges while considering regulations of the International Federation of Sport Climbing.
Methods Our video footage was filmed by the Swiss Alpine Club for judging at the European Climbing Championships 2024 in Villar and at the Swiss Championships 2024 in Lenzburg. Initially, twenty videos (30 seconds each including the fall from the wall towards the end) were extracted. Half of the videos were cropped to focus on the athlete. Thus, a total of ten videos from Villar (outdoor) were analysed – five uncropped (five men), five with focus (women) – and ten videos from Lenzburg (indoor) – five uncropped (five women), five with focus (five men).
As a first step, two pose estimators were evaluated regarding processing speed in frames per second (FPS) and percentage of frames in which a pose was detected: BlazePose from Google which has already been used for climbing research (Kim et al., 2023) and Real-Time Multi-Person Pose Estimation (RTMO) (Lu et al., 2024).
Results BlazePose demonstrated a higher average processing speed compared to RTMO: 22 FPS vs. 2 FPS. In uncropped videos, BlazePose detected a pose in 5% of outdoor video footage and 33% of indoor video footage. In close-up videos, BlazePose detected poses in 99% of outdoor video footage and 93% of indoor video footage. RTMO achieved a 100% pose detection rate across all videos whereby a subsequent inspection showed that many false poses were considered as “detected”.
Discussion/Conclusion An automated video assistant referee should decide quickly. RTMO seems too slow with the given hardware (AMD Ryzen 7, 3.2GHz, 16GB RAM), since pose estimation of a 5s video sequence (150 frames) would take more than a minute. The video would also have to be cropped before each pose estimation, otherwise few (BlazePose) or incorrect (RTMO) poses would be detected caused by picture content being more contrasting and larger than the athlete.
Next, we will compare our scoring algorithm representing given regulations with scores awarded. The results will then be discussed at the next educational event for judges at national and international level.
References
Kim, J.-W., Choi, J.-Y., Ha, E.-J., & Choi, J.-H. (2023). Human pose estimation using MediaPipe pose and optimization method based on a humanoid model. Applied Sciences, 13(4), 2700. https://doi.org/10.3390/app13042700
Lu, P., Jiang, T., Li, Y., Li, X., Chen, K., & Yang, W. (2024). RTMO: Towards high-performance one-stage real-time multi-person pose estimation. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1491–1500. https://doi.org/10.1109/CVPR52733.2024.00148
Published
Issue
Section
License
Copyright (c) 2025 Eliane Künzler, David Roder, Urs Stöcker, Peter Wolf
This work is licensed under a Creative Commons Attribution 4.0 International License.