US 12,169,974 B2
Efficient refinement neural network for real-time generic object-detection systems and methods
Jun Wang, Waterloo (CA)
Assigned to FLIR Unmanned Aerial Systems ULC, Vancouver (CA)
Filed by FLIR Unmanned Aerial Systems ULC, Vancouver (CA)
Filed on Jul. 13, 2021, as Appl. No. 17/374,909.
Claims priority of provisional application 63/051,823, filed on Jul. 14, 2020.
Prior Publication US 2022/0019843 A1, Jan. 20, 2022
Int. Cl. G06V 20/60 (2022.01); G06F 18/25 (2023.01); G06N 3/04 (2023.01); G06N 3/08 (2023.01); G06V 10/40 (2022.01)
CPC G06V 20/60 (2022.01) [G06F 18/253 (2023.01); G06N 3/04 (2013.01); G06N 3/08 (2013.01); G06V 10/40 (2022.01)] 18 Claims
OG exemplary drawing
 
1. A system comprising:
a processor configured to execute program logic to implement a multi-scale object detection neural network comprising:
a backbone network configured to receive an input image and generate multi-scale feature representations of the input image comprising a feature map for each of a plurality of image scales;
a feature fusion block configured to generate feature outputs for each of the plurality of image scales, at least one feature output comprising a fused feature map comprising features extracted at a corresponding image scale by the backbone network, and features corresponding to at least one smaller scale of the plurality of image scales;
a plurality of representation transfer modules configured to isolate and decouple processing of the multi-scale feature representations, wherein each of the feature outputs generated by the feature fusion block is output to two corresponding representative transfer modules;
a cascade refinement module configured to process each representation transfer module output to refine predictions, the cascade refinement module comprising, for each of the two corresponding representation transfer modules, a first processing path configured to generate predictions as refined anchors, and a second processing path configured to generate output predictions, based at least in part, on the refined anchors; and
an output layer configured to process the generated output predictions for each of the plurality of image scales and generate object detection information for the input image.