Find the download links to several datasets here:
Here, you can see the more details about the datasets and format requirements for some of the datasets necessary for your predictions to be properly evaluated. Note that you need to be registered first to participate in the benchmark. See the instructions for that in the FAQs.
You need so submit a json-file in COCO format. That is, a list of dictionaries, each containing a single prediction of the form
{ "image_id": 6503, "category_id": 4, "score": 0.6774558424949646, "bbox": [ 426.15203857421875, 563.6422119140625, 43.328399658203125, 18.97894287109375 ] }
The predictions are evaluated on AP50:0.05:95, AP50, AP75, AR1, AR10.
There are 80 testing video clips on which the performance is measured. The protocol is based on the implementation of PyTracking. Note that this is a short-term Single-Object Tracking task, meaning that clips only feature objects that are present in the video during the whole video and do not disappear and reappear.
You need to submit a zip-file containing exactly 80 text-files, each corresponding to the respective clip. Each text file has to be named j.txt where j is the number corresponding to the respective clip (1,...,80). Each text file has as many rows as its corresponding clip has frames. Each row has 4 comma separated numbers (x,y,w,h), where x is the left-most pixel value of the tracked object's bounding box, y the top-most pixel value and w and h the width and height of the bounding box in pixels.
Via the link for "Single-Object Tracking" above, you will find three json-files (SeaDronesSee_train.json, SeaDronesSee_val.json,SeaDronesSee_test.json). It is a dict of dicts, where for each track/clip number, you find the corresponding frames that need to be taken from the Multi-Object Tracking link.
For example, in the following you see the first track starting with the frame 000486.png, with corresponding path lake_constance_v2021_tracking/images/test/486.png, followed by frame 000487.png and so on. Afterwards, we have clip 2 starting with frame 000494.png and so on:
{"1": {"000486.png": "lake_constance_v2021_tracking/images/test/486.png", "000487.png": "lake_constance_v2021_tracking/images/test/487.png", "000488.png": "lake_constance_v2021_tracking/images/test/488.png",... "000636.png": "lake_constance_v2021_tracking/images/test/636.png"}, "2": {"000494.png": "lake_constance_v2021_tracking/images/test/494.png",...
Furthermore, you find three folders: train_annotations, val_annotations, test_annotations_first_frame. For the train and val case you find the corresponding annotations for the respective clip in the respective train or val set. Each clip has its own text file with each line corresponding to the bounding box for that frame. The test folder contains text files for each clip as well but only contains the bounding box ground truth for the very first frame and dummy values for the succeeding frames.
See also the compressed folder sample_submission.zip in the Single-Object Tracking nextcloud folder. This zip-archive could be uploaded right away but will naturally yield bad results.
The predictions are evaluated on precision and success numbers.
There are 22 video clips in the data on which you can test your trained tracker. The mapping from images to video clips can be done via the files 'instances_test_objects_in_water.json' and 'instances_test_swimmer.json', respectively. They can be found via the link above.
For example, '410.png' from the test set can be assigned to video clip 'DJI_0057.MP4' because its entry in the annotation file looks like this:
{'id': 410, 'file_name': '410.png', 'height': 2160, 'width': 3840, 'source': {'drone': 'mavic', 'folder_name': 'DJI_0057', 'video': 'DJI_0057.MP4', 'frame_no': 715}, 'video_id': 0, 'frame_index': 715, 'date_time': '2020-08-27T14:18:35.823800', 'meta': {'date_time': '2020-08-27T12:18:36', 'gps_latitude': 47.671949, 'gps_latitude_ref': 'N', 'gps_longitude': 9.269724, 'gps_longitude_ref': 'E', 'altitude': 8.599580615665955, 'gimbal_pitch': 45.4, 'compass_heading': 138.2, 'gimbal_heading': 140.9, 'speed': 0.6399828341528834, 'xspeed': -0.39998927134555207, 'yspeed': 0.39998927134555207, 'zspeed': 0.299991953509164}}
To submit your results to the MOT All-Objects-in-Water SeaDronesSee challenge as part of MaCVi, you have to upload a json-file in the format of
mmtracking. See an example on the challenge page.
The submission format for MOT Swimmers is similar to the one for the
MOT-challenge.
To submit your results for the MOT Swimmers, you have to upload a zip file containing one [video_id].txt file for each video clip in its top-level domain. The ID of each video can be obtained from the .json file. Information about each video there looks like this:
{'id': 0, 'height': 2160, 'width': 3840, 'name:': '/data/input/recordings/mavic/DJI_0057.MP4'}
Inside any of the .txt files there has to be one line per object per frame. Each line is formatted like:
[frame_id],[object_id],x,y,w,h
frame_id and object_id are supposed to be integers, the rest of the numbers may be floats. The frame_id can be obtained from the .json file while the object_id can be assigned by your tracker. Coordinates x and y are the upper left coordinate of the bounding box while w and h are its width and height, respectively. All of these are expressed in pixels.
This is a toy data set for the task of binary image classification. It aims at providing a simple hands-on benchmark to test small neural networks. There are the following two classes:
1 - if the image contains any watercraft instance including boats, ships, surfboards, ... ON the water
0 - all the rest, i.e. just water or anything on the land (could also be boats)
Naturally, there may be edge cases (e.g. boats at the verge of the water and the shore). As metrics, we employ the prediction accuracy (number of correctly predicted images divided by number of all images) and the number of parameters of the model. For this benchmark, you can upload your trained ONNX model to be ranked on the leaderboard. For that, please refer to this sample script. It trains a simple single-layer perceptron architecture on this data set upon saving and exporting the Pytorch model as an ONNX file. Make sure the exported model uses the transformation provided in this code, as this is the transformation used for the webserver evaluation.