Common object counting, also referred to as generic object counting, generally refers to computer vision functions associated with accurately predicting the number of different object category instances present in an image. Instance segmentation generally refers to computer vision functions associated with identifying locations of objects in an image with pixel-level accuracy. Both functions are especially difficult to perform in images involving natural scenes, which can comprise both indoor and outdoor scenes, and which can include objects in a very wide range of object categories (e.g., people, animals, plants, food, furniture, appliances, etc.) that have large intra-class variations.
To accurately perform object counting and instance segmentation, computer vision applications must account for a variety of technical problems. For example, one technical problem relates to counting objects in images in which large portions of the objects are hidden or heavily occluded. These occlusions may result from, inter alia, certain objects blocking views of other objects, portions of the objects being located on the periphery of images (thus, being partially cut out of the image), poor lighting conditions, background clutter, and/or objects being located far distances from the camera that captured the images. Other key challenges relate to accurately counting objects included in categories with large intra-class variations (e.g., such that a single class includes objects that vary greatly) and/or across diverse object categories (e.g., in situations where there are very large numbers of candidate object categories and the categories can vary greatly). The task of counting or classifying the objects in these scenarios can be extremely challenging. Additional challenges involve handling the co-existence of multiple instances of different objects in a single scene, and accounting for sparsity issues that result from many objects having a zero count across multiple images.