CPC G06V 20/60 (2022.01) [G06F 18/253 (2023.01); G06N 3/04 (2013.01); G06N 3/08 (2013.01); G06V 10/40 (2022.01)] | 18 Claims |
1. A system comprising:
a processor configured to execute program logic to implement a multi-scale object detection neural network comprising:
a backbone network configured to receive an input image and generate multi-scale feature representations of the input image comprising a feature map for each of a plurality of image scales;
a feature fusion block configured to generate feature outputs for each of the plurality of image scales, at least one feature output comprising a fused feature map comprising features extracted at a corresponding image scale by the backbone network, and features corresponding to at least one smaller scale of the plurality of image scales;
a plurality of representation transfer modules configured to isolate and decouple processing of the multi-scale feature representations, wherein each of the feature outputs generated by the feature fusion block is output to two corresponding representative transfer modules;
a cascade refinement module configured to process each representation transfer module output to refine predictions, the cascade refinement module comprising, for each of the two corresponding representation transfer modules, a first processing path configured to generate predictions as refined anchors, and a second processing path configured to generate output predictions, based at least in part, on the refined anchors; and
an output layer configured to process the generated output predictions for each of the plurality of image scales and generate object detection information for the input image.
|