Commonly, machine learning solutions for network traffic classification distinguish only two different classes: malicious and benign. For known types of malware serving different purposes with different risk levels, this is not sufficient. Therefore, multi-class classifiers trained on millions up to billions of samples are built, being significantly more robust to malware variations than traditional signatures. Multi-class labels can relate to different malware campaigns or families with well-known risk levels. Whenever a novel malware family is found or not covered by an existing classifier, the classifier is retrained.
Unfortunately, the cost of the retraining and deployment of the updated model can become expensive. For example, in threat research it is not possible to label all the traffic. Before a classifier is deployed, its detections have to be analyzed and the unknown detections have to be manually labeled in order to estimate true performance of the updated classifier. However, if the performance threshold put on precision, for example, does not hold, the classifier cannot be deployed.