From #pixels to objects: how the #brain builds rich representation of the natural world
Posted on April 15th, 2017
04/06/2017 @RutgersUniversity, Easton Hub Auditorum, Fiber Optics Building, Busch Campus
Jack Galliant @UCBerkeley presented a survey of current research on mapping the neurophysiology of the visual system in the brain. He first talked about the overall view of visual processing since the Felleman and Van Essen article in Cerebral Cortex in 1992. Their work on macaque monkey showed that any brain area has a 50% chance of being connected to any other part of the brain. Visual processing can be split into 3 areas
1.Early visual area – 2.intermediate visual areas – 3.high level visual areas
With pooling nonlinear transformations between areas (the inspiration for the non-linear mappings in convolutional neural nets (CNN)). The visual areas were identified by retinotopic maps – about 60 areas in humans with macaques having 10 to 15 areas in the V1 area.
Another important contribution was by David J. Field who argued that the mammalian visual system can only be understood relative to the images it is exposed to. In addition, natural images have a very specific structure – 1/f noise in the power spectrum – due to the occlusion of images which can be viewed from any angle (see Olshausen & Field, American scientist, 2000)
This lead to research resolving natural images by characterizing them by the correlation of pairs of points. Beyond pairs of points that approach becomes too computational intensive. In summary, natural images are only a small part of the universe of images (most of which humans classify as white noise)
Until 2012, researchers needed to specify the characteristics to identify items in images, but LeCun, Bengio & Hinton, Nature, 2015 showed that Alexnet could resolve many images using multiple layer models, faster computation, and lots of data. These deep neural nets work well, but the reasons for their success have yet to be worked out (He estimates it will take 5 to 10 years for the math to catch up).
One interesting exercise is running a CNN and then looking for activation in a structure in the brain: mapping the convolutional layers and feature layers to the correspondence on layers in the visual cortex. This reveals that V1 has bi-or tri-phasic functions – Gabor functions in different orientations. This is highly efficient as a sparse code needs to activate as few neurons as possible.
Next they used motion-energy models to see how mammals detect motion in the brain Voxels in V1 (Shinji Nishimoto). They determined that monitoring takes 10 to 20ms using Utah arrays to monitor single neurons. They have animal watch movies and analyze the input images using combination of complex and simple cell models (use Keras) to model neurons in V1 and V2 using a 16ms time scale.
High level visual areas
Jack then talked about research identify neurons in high level visual areas that respond to specific stimuli. Starting with fMRI his groups (Huth, Nishimoto, Vu & Gallant, Neuron, 2012) has identified many categories: face areas vs. objects; place minus face. By presented images and mapping which voxels in the brain are activated one can see how the 2000 categories are mapped in the brain using wordmap as the labels. Similar concepts are mapped to similar locations in the brain, but specific items in the semantic visual system interact with the semantic language areas – so a ‘dog’ can active many areas so it can be used in different ways and can be unified as needed. Each person will have a different mapping depending on their previous good and bad experiences with dogs.
He talked about other topics including the challenges of determining how things are stored in places: Fourier power, object categories, subjective distance. In order to activate any of these areas in isolation, one needs enough stimulus to activate the earlier layers. They have progress by building a decoder from the knowledge of the voxel which run from the brain area backwards to create stimulus. A blood flow model are used with a 2 second minimum sampling period. But there is lots of continuity so they can reconstruct a series of images.
Intermediate visual area
Intermediate visual areas between the lower and higher levels of processing are hard to understand – looks at V4. They respond to shapes of intermediate complexity, but not much else like a curvature detector. Using fMRI they know what image features correlate with specific areas, but there is no strong indication differentiating one layer from another. Using the Utah array, they need to do a log-polar transform to improve prediction in V4. Using a receptor field model, they can create a predictor frame and match brain activity to images that gave the largest response.
To improve prediction on V4, Utah arrays need to do a log-polar transform. However, the images are messy and predicting V4 is not the same as understanding V4.
Finally, he talked about attenuation and tuning effects on single neurons. In an experiment in subjects watched a movie and were asked to search for either humans or vehicles, there were changes in the semantic map based on the search criterion. These tuning shift effects are a function of distance to visual periphery: Attentional effects are small in V1 and get larger in the ensuing layers.
In the Q&A, he made the following points:
- The visual word form area in the brain becomes active as you learn to read. This change does not occur for people who are illiterate.
- One of the experimental assumptions is that the system is stationary, so there is not adaptation. If adaptation does occur, then they cannot compute a noise ceiling for the signals.
[Neural nets take inspiration from the neurobiology, especially the creation of convolutional neural nets, but there is now feedback with neurobiology using the tools created in machine learning to explore possible models of brain mapping. Does the pervasive existence of Gabor filters lead to an argument that their presence indicates that natural images are closely allied with fractal patterns?]