US 12,169,956 B2
Image recognition method using a plurality of machine-learned inference devices, image recognition apparatus for the same, and non-transitory computer readable recording medium storing a program of the same
Takuya Miyamoto, Osaka (JP); Kazunori Tanaka, Osaka (JP); Kanako Morimoto, Osaka (JP); Rui Hamabe, Osaka (JP); and Naomichi Higashiyama, Osaka (JP)
Assigned to KYOCERA DOCUMENT SOLUTIONS INC., Osaka (JP)
Filed by KYOCERA Document Solutions Inc., Osaka (JP)
Filed on Dec. 28, 2021, as Appl. No. 17/563,355.
Claims priority of application No. 2020-218477 (JP), filed on Dec. 28, 2020.
Prior Publication US 2022/0207853 A1, Jun. 30, 2022
Int. Cl. G06V 10/77 (2022.01); G06N 20/00 (2019.01); G06V 10/422 (2022.01)
CPC G06V 10/422 (2022.01) [G06N 20/00 (2019.01); G06V 10/7715 (2022.01)] 7 Claims
OG exemplary drawing
 
1. An image recognition method comprising:
a feature amount extracting step of generating, from an input image, a base feature map group including a plurality of base feature maps;
an inferring step of providing a plurality of inference inputs based on the base feature map group to a plurality of machine-learned inference devices, respectively, and deriving a plurality of inference results corresponding respectively to the plurality of inference inputs, wherein the plurality of inference inputs are one or more base feature maps classified by the sizes, respectively; and
an integrating step of integrating the plurality of inference results by using weight factors as an integrator that has been subjected to machine-learning to derive a final inference result, wherein
each of the plurality of inference inputs is a base feature map set which includes some or all base feature maps of the plurality of base feature maps,
the some or all base feature maps of the base feature map set are different in part or whole from one another between the plurality of inference inputs, and
weight factors are set based on inference accuracy of the each of the plurality of inference devices by cross validation or a distribution of a specific feature amount of the input image and a distribution of the specific feature amount of an input image of training data used for machine-learning of the plurality of inference devices.