Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

OCSORT + ByteTrack?

See original GitHub issue

Thanks for the amazing work again!

After replacing the SORT kalman filter in ocsort.py with the JDE kalman filter, I got higher HOTA and faster speed, which may indicates that ocsort with SORT settings can be improved.

So, do you plan to provide a version of ocsort with BYTE?

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:16 (13 by maintainers)

Top GitHub Comments

14reactions

HanGuangXincommented, Apr 21, 2022

ocsort

Here are my results, using pretrained model to run evaluation in MOT17_val_half and DanceTrack_val. For each metric, red > blue > green.

There are observations which does not make me confused:

Performance of ByteTrack and OC-SORT in MOT17 and DanceTrack is not the same. In MOT17, ByteTrack consistently performs better than OC-SORT on different settings. But I think it is reasonable because OC-SORT means to improve performance under occlusion and non-linear motion.
For original ByteTrack and OC-SORT on DanceTrack. ByteTrack has higher MOTA, but OC-SORT has much higher HOTA. I think it is reasonable, too. Because ByteTrack use BYTE to improve the MOTA.

There are observations which does make me confused:

About JDE kalman filter and SORT kalman filter. JDE kalman filter performs better in MOT17, but worse in DanceTrack. Why is that?
Questions about NSA can pass for now.
The BYTE will benifit both MOTA and HOTA in OC-SORT on MOT17, which I think reasonable. But BYTE only benifit MOTA in OC-SORT on DanceTrack, then harm HOTA a lot, which makes me very confused. Why does BYTE even harm HOTA in OC-SORT.

Looking forward to your reply.

11reactions

noahcaocommented, Apr 14, 2022

Hi @HanGuangXin ,

You have done really a wonderful study! It is quite impressive to me. I have some experience and thoughts from the observations you provide.

Why OC-SORT is inferior to DanceTrack on MOT17 half_val: there are some insights:

MOT17 is a dataset where object (pedestrians) usually move very linearly. So given the linear-motion-based Kalman filter, most of its failure is caused by the missing of “observations” (detections). If you try to make some ablation study over the threshold of IoU in OCR or the threshold of IoU during general association, you might get an impression that the key to boost performance on such a dataset it to “recall more observations!”. BYTE is designed to use the low-confidence detections so it may have a good performance on such a situation by recalling more detections.
Here is another key point that, the splitting of MOT17 train_hal/val_hal is a compromise to the limited that. It may not be perfectly reasonable that these two parts come from the same video sequences (the former half and the latter half). So, during the training, the detetcior (or even the tracker for some joint-detection-and-tracking methods) has actually seen the objects in the half_val subset. This makes a consequence that it is very secure to trust the detections predicted on the val_half even if sometimes their confidence score / IoU score is not that high. “Recalling more” strategy can be even more successful given this background.
Given the motion pattern of objects on MOT17 is simple, the overall performance (even if using HOTA) is highly influenced by the detection part. We actually have a study in the DanceTrack paper that given the oracle detections, even the most naive IoU matching can result in nearly perfect tracking performance on MOT17 (HOTA=98.1). Given this bias, MOT17 may encourage methods can focus more on the detection quality, which is supplementary to the first two points above.

Why JDE is inferior to KF on DanceTrack: there are many variants in the implementation of JDE that can influence the results. For example: (1) whether you have considered the influence of OOS from OC_SORT in the comparison? (2) how do you generate the embeddings for JDE, and so on. But there is a potential reason that comes to my mind first: JDE is designed to be able to incorporate with object appearance features. Given the object appearance on MOT17 is usually distinguishable, objects’ appearance embedding is usually helpful in association. But the object appearance on DanceTrack is quite similar so appearance embeddings have very high noise in association. I would recommend you to read the original paper of Dancetrack for more details hidden in the dataset characteristics.
Why does BYTE even hurt HOTA on DanceTrack: it is also comes from the nature of the dataset DanceTrack that the detection on DanceTrack is very simple (refer to Table 3 in the DanceTrack paper, all detection-focused metric is much higher on DanceTrack than on MOT17). So the detection confidence of true targets is usually very high. The typical situation that one detection’s confidence is very low is when it has high overlap with another object. Therefore, the strategy to bringing more detections by BYTE is likely to introduce more noisy observations than it does on MOT17. If there is one more detection, there is likely one less FN during evaluation, making higher MOTA. But the one more detection can be of large overlap with other targets, making more difficulty for association and higher chance of ID switch. So it may be expected to get lower HOTA, which evaluates the tracking performance in a tracklet-wise level instead of frame-wise level.

I provide some intuitions and experience from my own study above for your question. I hope they can be helpful. Again, the bias of dataset is always important when we consider an algorithm. I highly recommend you to read DanceTrack paper for more details.

To make our discussion helpful to a broad community, let’s discuss here instead of via private message platforms.

Top Results From Across the Web

MOT17 Benchmark (Multi-Object Tracking) | Papers With Code

Rank Model HOTA MOTA IDF1 Year 1 SMILEtrack 65.24 81.06 80.5 2022 2 BoT‑SORT 65.0 80.5 80.2 2022 3 StrongSORT 64.4 79.6 79.5 2022

Benchmark and Model Zoo - MMTracking's documentation!

SORT/DeepSORT (ICIP 2016/2017); Tracktor (ICCV 2019); QDTrack (CVPR 2021); ByteTrack (ECCV 2022); OC-SORT (ArXiv 2022). Baselines of single object tracking.

Object Tracking State of the Art 2022 | by Pedro Azevedo

Learn about the current SOTA like ByteTrack, DeepSORT, StrongSORT and OC-SORT. First, an introduction will be made to the key metrics used in...

How do humans and machine learning models track multiple ...

ByteTrack DeepSORT OC-SORT SORT human no low high occlusion. Figure 2: Results of Experiment 1. Response accuracy main effects (a) and selected interactions ......

Results on MOT17 test set with the private detections. ByteTrack

Results on MOT17 test set with the private detections. ByteTrack, OCSORT and Ours use the same detections. The best results are shown in...