Skip to content

[Feature] Parse the model response #36

@angel-penchev

Description

@angel-penchev

Describe the feature you'd like
The YOLO neural network return spec is in shape (16, 16, 5, 12). The 16 x 16 spec represents image squares divisions (grid cells) with a side of 32 pixels (516/32 = 16). The 5 x 12 represents the 5 bounding box predictions a grid should propose. A bounding box prediction is with size 5 (x of box center, y of box center, w of the box, h of the box, probability that an object exists in this box) + 7 probabilities of a given class existing in the grid cell (classes listed here) = 12.

A much clearer explanation could be found here.

Now as the part you need to code. First you need to define two constants: MIN_SCORE = 0.5
and MIN_IOU = 0.45. Then you need to iterate over every grid cell (16 x 16 = 256 gird cells in total).
For each cell you iterate over the 5 bounding box predictions. If the 'probability that an object exists in this box' (the 5th element in the spec) is higher than MIN_SCORE and a given class probability is higher than MIN_IOU, then add one to the final dictionary for the given class. The dictionary should look something like this: { "bicycle": 0, "bus": 1, "car": 8, "horse": 0, "motorbike": 0, "person": 0, "train": 0 }.

Additional context
Traffic Brain Networking Convert responce

Metadata

Metadata

Assignees

Labels

featureNew feature or requestserverRequest regarding the main project server.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions