Conventional computer-implemented top-down image recognition methods build classification models based on features extracted from a subset of pixels in a digital image. Features for a specific pixel are extracted from both the specific pixel and a surrounding feature extraction region comprising neighboring pixels. The digital images from which features are extracted are labeled by humans with ground truths representing the classification of each pixel or region of pixels. The labeled images are then used in conjunction with the extracted features to build models to automatically classify the features of new images. Conventional top-down image recognition frameworks rely on randomly-determined pixels and feature extraction regions for feature extraction during a model training phase. Conventional feature extraction techniques using randomly-determined pixels are easy to implement, but have several drawbacks. Randomly-determined feature extraction regions are likely to overlap, causing some image data to be redundantly sampled. Randomly-determined feature extraction regions may not cover an entire image, and the subsequently-generated models may therefore have data gaps. Randomly-determined feature extraction regions may also suffer from inhomogeneity in the characteristics of their constituent pixels. In a classification stage of a conventional top-down image recognition framework, classifying a digital image comprising several megapixels pixel by pixel is a time intensive task.