Understanding a visual scene portrayed in an image may employ techniques including one or more of detecting scene features, recognizing objects from the detected features (e.g., identifying, categorizing, and/or other techniques for object recognition), determining locations of objects within the scene, and/or determining other information associated with the scene. Contextual models for understanding a scene may attempt to build various models of various forms for recognition. A simplest among those may look at label co-occurrence or exclusion among object categories in a given image. Others may look at enhancement or inhibition of detections using both co-occurrence and spatial local contextual relations, for example, through the use of structured image labeling, visual phrases, or discovered object groups, and/or to order detectors such that weaker detectors may benefit from stronger ones. Some models may look at context across granularities, for example, using texture patches to enhance performance of object detectors.