This specification relates to detecting objects in images.
Deep neural networks (DNNs), or “deep networks,” are machine learning systems that employ multiple layers of models, where the outputs of lower level layers are used to construct the outputs of higher level layers.
Object detection includes determining one or more portions of an image that include an object. The portions of an image that include the object can be designated by a bounding shape, e.g., a bounding box. An object mask indicates which portions of the image include the object and which portions of the image do not include the object. For example, for each pixel in an image, an object mask can assign a value of zero if the pixel is not part of the object in the image and a value of one if the pixel is part of the object.