Object Recognition, Visual Cues, and Learning

Recognition of objects is an important, yet challenging high level  task with applications in many different areas including image segmentation, pattern recognition, classification, motion tracking, and 3D reconstruction.

Recently, I was asked several times  why I followed human visual system as a model for object recognition in my thesis (PS. My thesis is challenging two medical image processing problems:  automatic anatomy recognition and registration).  Psychophysical experiments answer this question easily: human vision extracts useful information from images. Reconstructing the three-dimensional structure of the environment and recognising the objects that populate it are among the most important functions of human visual system [1]. I like to give the postulate given by [1] as an answer to this question: “Recognition of an object requires a model which associates an image with a memory of that object. Typically we do not recognize things that we have not seen before (human faces and snakes are rare exceptions), i.e. Models are usually not innate, therefore we must construct models from our daily visual experience. We are, however, good at generalisation; we will recognise a person as such even if we have never met that specific person before”. The human visual system is able to recognise objects despite tremendous variation in their appearance on the retina resulting from variation in view, size, lighting, etc. This ability—known as “invariant” object recognition—is central to visual perception, yet its computational underpinnings are poorly understood. Any given object can cast an infinite number of different images onto the retina, depending on the object’s position relative to the viewer, the configuration of light sources, and the presence of other objects. Despite this tremendous variation, we are able to rapidly recognize thousands of distinct object classes without apparent effort. At present, we know little about how the brain achieves robust, “invariant” object recognition, and reproducing this ability remains a major challenge in the construction of artificial vision systems [2].

For image analysts, especially for radiologists, we see the same process. For example, as I mention in my thesis, when radiologists look for the boundary of a liver in a low resolution a CT scan, they might use the shape and appearance information of the other objects nearby (right kidney, spleen, left kidney), and they might use their previous knowledge (model) about the object of interest. The pose of the object of interest, size, and its interrelationship with other objects nearby are encoded in their attentional preference in the brain.

A nice example explaining this situation appears when radiologists complete missing boundaries of an object of interest using their prior information on the object of interest. Human outperforms computer in recognition challenges (high level processing), but they fail when the problem is low level!!. By following human recognition system, many image analysis techniques can be improved further, especially in the medical image processing applications where human interactions are needed !!

[1] Joachim M. Buhmann, Jitendra Malik, and Pietro Perona, Image recognition: Visual grouping, recognition, and learning. PNAS Vol 96(25), pp.14203-14204, 1999.

[2] Zoccolan, D., et al.A rodent model for the study of invariant visual object recognition. PNAS Vol 106(21), pp.8748-8753, 2009.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s