Position invariant recognition in the visual system with cluttered environments.
Stringer SM., Rolls ET.
The effects of cluttered environments are investigated on the performance of a hierarchical multilayer model of invariant object recognition in the visual system (VisNet) that employs learning rules that utilise a trace of previous neural activity. This class of model relies on the spatio-temporal statistics of natural visual inputs to be able to associate together different exemplars of the same stimulus or object which will tend to occur in temporal proximity. In this paper the different exemplars of a stimulus are the same stimulus in different positions. First it is shown that if the stimuli have been learned previously against a plain background, then the stimuli can be correctly recognised even in environments with cluttered (e.g. natural) backgrounds which form complex scenes. Second it is shown that the functional architecture has difficulty in learning new objects if they are presented against cluttered backgrounds. It is suggested that processes such as the use of a high-resolution fovea, or attention, may be particularly useful in suppressing the effects of background noise and in segmenting objects from their background when new objects need to be learned. However, it is shown third that this problem may be ameliorated by the prior existence of stimulus tuned feature detecting neurons in the early layers of the VisNet, and that these feature detecting neurons may be set up through previous exposure to the relevant class of objects. Fourth we extend these results to partially occluded objects, showing that (in contrast with many artificial vision systems) correct recognition in this class of architecture can occur if the objects have been learned previously without occlusion.