Event

Marco Pedersoli, ETS

Friday, May 26, 2017 13:45to15:30
Room 5340, Universite de Montreal, 2920, ch. de la Tour, Montreal, QC, CA, CA

Areas of Attention for Image Captioning.

We propose “Areas of Attention”, a novel attention-based model for automatic image captioning.  Our approachmodels the dependencies between image regions, caption words, and the state of an RNN language model, using three pairwise interactions. In contrast to previous attention-based approaches that associate image regions only to the RNN state, our method allows a direct association between caption words and image regions. During training these associations are inferred from image-level captions, akin to weakly-supervised object detector training. These associ- ations help to improve captioning by localizing the corresponding regions during testing. We also propose and com- pare different ways of generating attention areas: CNN activation grids, object proposals, and spatial transformers nets applied in a convolutional fashion. Spatial transformers give the best results. They allow for image specific at- tention areas, and can be trained jointly with the rest of the network. Our attention mechanism and spatial transformer attention areas together yield state-of-the-art results on the MSCOCO dataset.

Bio: Marco Pedersoli obtained his Ph.D. from the Autonomous University Of Barcelona (june 2012, with distinction and best thesis award) on Hierarchical Multi-resolution Detection of objects in images. He completed a postdoctoral fellowship at the KU Leuven, where he has developed several innovative approaches for object detection, action classification and pose estimation based on weakly-supervised methods. In September 2015 he moved to INRIA-Grenoble, where he has developed new techniques for the automatic description and understanding of images. From February 2017 he is assistant professor at the École de technologie supérieure of Montreal. He has published more than 20 articles in peer-reviewed internationals journals and top peer-reviewed conferences in computer vision.
Back to top