An image recognition system provides a computer application that detects and identifies an object or multiple objects from a digital image or a video frame. Deep learning-based systems and methods have been achieving increasingly accurate performance on visual comprehension. However, it may be difficult to detect an object in an image that is relatively small, cluttered, or occluded by other objects. Other typical systems may fail to detect such instances, or detect part of an object as a whole, or combine different portions of an object into the entire object. For example, a system may erroneously detect a first user's face and a second user's shoulder that is occluded as the same user.