1. Technical Field
This disclosure relates to extracting object edges from images.
2. Description of Related Art
Detecting object contours can be a key step to object recognition. See Biederman, I. (1987), “Recognition-by-components: A theory of human image understanding” Psychological Review, 94(2), 115-147, doi:10.1037/0033-295X.94.2.115; Biederman, I., & Ju, G. (1988), “Surface versus edge-based determinants of visual recognition”, Cognitive Psychology, 20(1), 38-64. doi:10.1016/0010-0285(88)90024-2; DeCarlo, D. (2008, August 12), “Perception of line drawings”, Presented at the SIGGRAPH 2008, Retrieved from http://gfx.cs.princeton.edu/proj/sg08lines/lines-7-perception.pdf; Kourtzi, Z., & Kanwisher, N. (2001), “Representation of Perceived Object Shape by the Human Lateral Occipital Complex”, Science, 293(5534), 1506-1509, doi:10.1126/science.1061133; Lowe, D. G. (1999), “Object recognition from local scale-invariant features”, The Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, (Vol. 2, pp. 1150-1157 vol. 2), Presented at The Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, IEEE. doi:10.1109/ICCV. 1999.790410; Marr, D. (1983), “Vision: A Computational Investigation into the Human Representation and Processing of Visual Information”, Henry Holt and Company; Papari, G., & Petkov, N. (2011), “Edge and line oriented contour detection: State of the art”, Image and Vision Computing, 29(2-3), 79-103. doi:10.1016/j.imavis.2010.08.009.
A computation in visual cortex may be the extraction of object contours, where the first stage of processing is commonly attributed to V1 simple cells. The standard model of a simple cell—an oriented linear filter followed by a divisive normalization—may fit a wide variety of physiological data, but may be a poor performing local edge detector when applied to natural images. The brain's ability to finely discriminate edges from non-edges therefore may depend on information encoded by local oriented cell populations.
Algorithms that detect object contours in natural scenes may not be completely accurate. Raising thresholds or applying an expansive output nonlinearity (Heeger, D. J. (1992), “Half-squaring in responses of cat striate cells”, Visual Neuroscience, 9(05), 427-443, doi:10.1017/S095252380001124X) can sharpen tuning curves to an arbitrary degree, but may not be an effective strategy from an edge-detection perspective because the underlying linear filtering operation may not be able to distinguish properly aligned low contrast edges from misaligned high contrast ones (or a multitude of contrast non-edge structures). This weakness may not be remedied by output thresholding.
Other edge/contour detection algorithms may exploit the Gestalt principle of “good continuation” or related principles to improve detection performance See Choe, Y., & Miikkulainen, R. (1998), “Self-organization and segmentation in a laterally connected orientation map of spiking neurons”, Neurocomputing, 21(1-3), 139-158, doi:10.1016/50925-2312(98)00040-X; Elder, J. H., & Zucker, S. W. (1998), “Local scale control for edge detection and blur estimation”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(7), 699-716, doi:10.1109/34.689301; Grossberg, S., & Williamson, J. R. (2001), “A Neural Model of how Horizontal and Interlaminar Connections of Visual Cortex Develop into Adult Circuits that Carry Out Perceptual Grouping and Learning”, Cerebral Cortex, 11(1), 37-58, doi:10.1093/cercor/11.1.37; Guy, G., & Medioni, G. (1992), “Perceptual grouping using global saliency-enhancing operators”, 11th IAPR International Conference on Pattern Recognition, 1992. Vol. I. Conference A: Computer Vision and Applications, Proceedings (pp. 99-103), Presented at the 11th IAPR International Conference on Pattern Recognition, 1992. Vol. I. Conference A: Computer Vision and Applications, Proceedings. doi:10.1109/ICPR. 1992.201517; Li, Z. (1998), “A Neural Model of Contour Integration in the Primary Visual Cortex”, Neural Computation, 10(4), 903-940, doi:10.1162/089976698300017557; Parent, P., & Zucker, S. W. (1989), “Trace inference, curvature consistency, and curve detection”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(8), 823-839, doi:10.1109/34.31445; Ross, W., Grossberg, S., & Mingolla, E. (2000), “Visual cortical mechanisms of perceptual grouping: interacting layers, networks, columns, and maps”, Neural Networks, 13(6), 571-588, doi:10.1016/50893-6080(00)00040-X; Sha'asua, A., & Ullman, S. (1988), “Structural Saliency: The Detection Of Globally Salient Structures using A Locally Connected Network”, Second International Conference on Computer Vision (pp. 321-327), Presented at the Second International Conference on Computer Vision, doi:10.1109/CCV. 1988.590008; VanRullen, R., Delorme, A., & Thorpe, S. (2001), “Feed-forward contour integration in primary visual cortex based on asynchronous spike propagation”, Neurocomputing, 38-40, 1003-1009, doi:10.1016/S0925-2312(01)00445-3; Williams, L. R., & Jacobs, D. W. (1997), “Stochastic Completion Fields: A Neural Model of Illusory Contour Shape and Salience” Neural Computation, 9(4), 837-858, doi:10.1162/neco.1997.9.4.837; Yen, S. C., & Finkel, L. H. (1998), “Extraction of perceptually salient contours by striate cortical networks”, Vision Research, 38(5), 719-741. doi:10.1016/S0042-6989(97)00197-1. Measurements needed for contour extraction may lie in a butterfly-shaped “association field” centered on a reference edge that reflects contour continuity principles, see Field, D. J., Hayes, A., & Hess, R. F. (1993), “Contour integration by the human visual system: evidence for a local “association field””, Vision Research, 33(2), 173-193, with an inhibitory region orthogonal to the edge, see FIG. 1; Geisler, W S, Perry, J. S., Super, B. J., & Gallogly, D. P. (2001), “Edge co-occurrence in natural images predicts contour grouping performance”, Vision Research, 41(6), 711-724; Kapadia, M. K., Westheimer, G., & Gilbert, C. D. (2000), “Spatial Distribution of Contextual Interactions in Primary Visual Cortex and in Visual Perception”, Journal of Neurophysiology, 84(4), 2048-2062; Li, Z. (1998), “A Neural Model of Contour Integration in the Primary Visual Cortex”, Neural Computation, 10(4), 903-940. doi:10.1162/089976698300017557, that presumably reflects the tendency for only a single object contour at a time to pass through any given point in the image.
Identifying a set of image measurements that are most useful for contour extraction can be a crucial step, but may leave open the question as to how those measurements should be algorithmically combined to detect contours in natural images. A priori (e.g. geometric) models of edge/contour structure can provide important insights, but may face challenges, such as including the multiscale structure of natural object boundaries, lighting inhomogeneities, partial occlusions, disappearing local contrast, and optical effects such as blur from limited depth of field. All of these complexities, and others known and unknown, may in principle be treated as noise sources that randomly perturb filter values in the vicinity of a candidate edge, suggesting that a probabilistic, population-based approach to edge detection may be most appropriate. See Dollar, P., Tu, Z., & Belongie, S. (2006), “Supervised Learning of Edges and Object Boundaries”, Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition—Volume 2 (pp. 1964-1971), IEEE Computer Society, Retrieved from http://portal.acm.org/citation.cfm?id=1153171.1153683; Konishi, S., Yuille, A. L., Coughlan, J. M., & Zhu, S. C. (2003), “Statistical Edge Detection: Learning and Evaluating Edge Cues”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(1), 57-74, doi:http://doi.ieeecomputersociety.org/10.1109/TPAMI.2003.1159946.
The way a population of filter responses r1, r2 . . . rN should be combined to calculate the probability that an edge exists at a reference location and orientation may follow from Bayes rule. See Equations 1 below. Bayesian inference has had successes in explaining behavior in sensory and motor tasks. See Fiser, J., Berkes, P., Orbán, G., & Lengyel, M. (2010), “Statistically optimal perception and learning: from behavior to neural representations”, Trends in Cognitive Sciences, 14(3), 119-130. doi:10.1016/j.tics.2010.01.003; Kording, K. P., & Wolpert, D. M. (2004), “Bayesian integration in sensorimotor learning”, Nature, 427(6971), 244-247. doi:10.1038/nature02169; Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011), “How to Grow a Mind: Statistics, Structure, and Abstraction”, Science, 331(6022), 1279-1285. doi:10.1126/science.1192788; Weiss, Y., Simoncelli, E. P., & Adelson, E. H. (2002), “Motion illusions as optimal percepts”, Nature neuroscience, 5(6), 598-604; Yuille, A., & Kersten, D. (2006), “Vision as Bayesian inference: analysis by synthesis?”, Trends in Cognitive Sciences, 10(7), 301-308. doi:10.1016/j.tics.2006.05.002; Yuille, A. L., & Grzywacz, N. M. (1988), “A computational theory for the perception of coherent visual motion”, Published online: May 1988; |doi:10.1038/333071a0, 333(6168), 71-74. doi:10.1038/333071a0. However, in the context of edge detection within a V1-like architecture, given that there are thousands of oriented filters within a small distance of a candidate edge, the need for human labeled ground truth data may make it necessary to fully populate the joint on-edge and off-edge likelihood functions, which grows exponentially more expensive with the number of filters used.
Konishi, S., Yuille, A. L., Coughlan, J. M., & Zhu, S. C. (2003), “Statistical Edge Detection Learning and Evaluating Edge Cues”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(1), 57-74, doi:http://doi.ieeecomputersociety.org/10.1109/TPAMI.2003.1159946, dealt with this “curse of dimensionality” by limiting their analysis to small sets of off-the-shelf edge filters centered on a candidate edge (up to 6 filters at a time). They used an adaptive binning method to efficiently tabulate the multi-dimensional on- and off-edge likelihood functions from preexisting human-labeled edge databases. Their approach led to improved edge detection performance compared to single-feature edge classifiers, but did not address the issue as to whether, or how, human labeled data could be collected in such a structured way as to facilitate the identification of filter combinations where the participating filters are individually (1) informative as to the presence of an edge, and (2) statistically independent both when an edge is present and when one is absent, that is “class conditionally independent” (CCI). In this special case of CCI filters, edge probability can be calculated based on much less human-labeled data. In particular, evaluating Bayes rule requires knowing only the 1-dimensional marginal likelihood distributions for each of the N filter values on and off edges, rather than the N-dimensional joint marginal distributions of the N filters together.