Abstract
Deep learning methods have led to remarkable progress in multiple object tracking (MOT). However, when tracking in crowded scenes, existing methods still suffer from both inaccurate and missing detections. This paper proposes Detection Refinement for Tracking (DRT) to address these two issues for people tracking. First, we construct an encoder-decoder backbone network with a novel semi-supervised heatmap training procedure, which leverages human heatmaps to obtain a more precise localization of the targets. Second, we integrate a "one patch, multiple predictions" mechanism into DRT which refines the detection results and recovers occluded pedestrians at the same time. Additionally, we leverage a data-driven LSTM-based motion model which can recover lost targets at a negligible computational cost. Compared with strong baseline methods, our DRT achieves significant improvements on publicly available MOT datasets. In addition, DRT generalizes well, i.e. it can be applied to any detector to improve their performance.
Original language | English |
---|---|
Title of host publication | British Machine Vision Conference (BMVC) 2021 |
Publisher | The British Machine Vision Association |
Number of pages | 14 |
Publication status | Published - 23 Nov 2021 |
Event | 32nd British Machine Vision Conference: BMVC 2021 - Virtuell, United Kingdom Duration: 22 Nov 2021 → 25 Nov 2021 |
Conference
Conference | 32nd British Machine Vision Conference |
---|---|
Abbreviated title | BMVC 2021 |
Country/Territory | United Kingdom |
City | Virtuell |
Period | 22/11/21 → 25/11/21 |