US 12,169,875 B2
Model training method and apparatus for image recognition, network device, and storage medium
Weidong Chen, Shenzhen (CN); Baoyuan Wu, Shenzhen (CN); Wei Liu, Shenzhen (CN); Yanbo Fan, Shenzhen (CN); Yong Zhang, Shenzhen (CN); and Tong Zhang, Shenzhen (CN)
Assigned to TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Shenzhen (CN)
Filed by TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Shenzhen (CN)
Filed on Oct. 28, 2020, as Appl. No. 17/083,180.
Application 17/083,180 is a continuation of application No. PCT/CN2019/110361, filed on Oct. 10, 2019.
Claims priority of application No. 201811180282.2 (CN), filed on Oct. 10, 2018.
Prior Publication US 2021/0042580 A1, Feb. 11, 2021
Int. Cl. G06T 1/20 (2006.01); G06F 18/214 (2023.01); G06N 3/04 (2023.01); G06N 3/063 (2023.01); G06N 3/08 (2023.01); G06T 3/4046 (2024.01); G06V 10/764 (2022.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01)
CPC G06T 1/20 (2013.01) [G06F 18/214 (2023.01); G06N 3/04 (2013.01); G06N 3/063 (2013.01); G06N 3/08 (2013.01); G06T 3/4046 (2013.01); G06V 10/764 (2022.01); G06V 10/7753 (2022.01); G06V 10/82 (2022.01)] 16 Claims
OG exemplary drawing
 
1. A model training method for image recognition, performed by a network device, the method comprising:
obtaining a multi-label image training set, the multi-label image training set comprising a plurality of batches of training images, and each training image being annotated with a plurality of sample labels;
performing a plurality of times of batch training on an image recognition model based on the plurality of batches of training images, comprising: for a current batch training:
selecting target training images of a current batch from the multi-label image training set for training a current model of the image recognition model;
performing label prediction on each target training image by using the current model, to obtain a plurality of predicted labels of the each target training image;
obtaining a first training image overall type corresponding to each sample label of target training images of an adjacent batch training, and a number of times that training images having labels the same as the sample label occur successively within the adjacent batch training, the first training image overall type corresponding to the sample label indicating whether one or more successive training images having labels the same as the sample label exist in the adjacent batch training;
obtaining a second training image overall type corresponding to each sample label of the target training images of the current batch training, the second training image overall type corresponding to the each sample label indicating whether one or more successive training images having labels the same as the sample label exist in the current batch training;
obtaining a cross-entropy loss function corresponding to the plurality of sample labels of the each target training image and updating a cross-entropy loss attenuation parameter of the cross-entropy loss function according to the first training image overall type, the second training image overall type and the number of times, a positive label loss in the cross-entropy loss function being provided with a weight greater than 1, and the positive label loss is greater than a negative label loss; and
converging the predicted labels and the sample labels of the each target training image according to the cross-entropy loss function to update parameters of the current model, to obtain a trained model of the image recognition model corresponding to the current batch training,
wherein:
the method further comprises: before the performing label prediction on each target training image by using the current model, extracting a corresponding regional image from the target training image; scaling the regional image to a preset size, to obtain a scaled image; and performing random disturbance processing on the scaled image, to obtain a preprocessed training image; and
the performing label prediction on each target training image by using the current model comprises: performing label prediction on each preprocessed training image by using the current model.