LocoTrack is an incredibly efficient model, enabling near-dense point tracking in real-time. It is 6x faster than the previous state-of-the-art model. The y-axis represents accuracy, the x-axis represents speed, and the size of the circle represents the number of parameters.

Abstract

We introduce LocoTrack, a highly accurate and efficient model designed for the task of tracking any point (TAP) across video sequences. Previous approaches in this task often rely on local 2D correlation maps to establish correspondences from a point in the query image to a local region in the target image, which often struggle with homogeneous regions or repetitive features, leading to matching ambiguities. LocoTrack overcomes this challenge with a novel approach that utilizes all-pair correspondences across regions, i.e., local 4D correlation, to establish precise correspondences, with bidirectional correspondence and matching smoothness significantly enhancing robustness against ambiguities. We also incorporate a lightweight correlation encoder to enhance computational efficiency, and a compact Transformer architecture to integrate long-term temporal information. LocoTrack achieves unmatched accuracy on all TAP-Vid benchmarks and operates at a speed almost 6 times faster than the current state-of-the-art.

Extreme Efficiency

Results with a varying number of refinement iterations on TAP-Vid-DAVIS. The number in the circle denotes the number of iterations. (up) In a 256×256 resolution, compared to TAPIR[1], ours achieves better performance in a single iteration while being about 9× faster. (below) In a 384×512 resolution, compared to CoTracker[2], ours achieves comparable performance while being about 9× faster.


Model Comparisons

Video CoTracker[2] TAPIR[1] Ours
Dance Twirl
Scooter Board
Swing
Drone
Scooter Gray

Citation

References

[1] Doersch et al., "Tapir: Tracking any point with per-frame initialization and temporal refinement", CVPR 2023.

[2] Karaev et al., "CoTracker: It is Better to Track Together", ECCV 2024.


Acknowledgements

The website template was borrowed from Michaël Gharbi.