Wednesday, December 12, 2018

Recurrent Models of Visual Attention

Volodymyr Mnih Nicolas Heess Alex Graves Koray Kavukcuoglu

Image prediction tasks (classification, object detection)


  • Train a model to process images by focusing on a sequence of small regions of an image, instead of the whole image. Use reinforcement learning to control the next region to focus on.
  • Recurrent Attention Model (RAM):
    • Glimpse sensor: Extracts a patch of the image
    • Glimpse network: Converts patch+location into a vector space, the glimpse representation. Trainable parameters
    • Main network: 
      • RNN: Glimpse representation + Hidden state = New hidden state
      • Reinforcement learning:
        • Agent: Action + New Location to focus attention on
        • Action execution -> New image + Reward
        • Goal: Maximize reward over a number of glimpses

RAM with 4-8 glimpses can perform as good as a FC/CNN on MNIST and exceeds FC/CNNs on translated MNIST (MNIST images translated in position with noise added).

No comments:

Post a Comment