Le prochain séminaire de l'équipe A3SI du LIGM (unité mixte de recherche de l'Université Paris Est) aura lieu le lundi 20 juin à 14h00, dans la salle de réunion B 412 du groupe IMAGINE (ENPC - Bat. Coriolis).
Abstract: Computer vision has recently made great progress through the use of deep learning, trained with large-scale labeled data. However, good labeled data requires expertise and curation, can be expensive to collect, and will always be available in smaller quantities than unlabeled data. Can we discover useful visual representations without the use of explicitly curated labels? To this end, we explore the paradigm of self-supervised learning: using data itself as labels. We will describe several case studies in which we train a neural net to predict one part of a raw sensory signal from another. First, we learn to in-paint missing pixels given their surrounding context. Second, we learn to add missing color to black and white photos. Finally, we learn to predict missing sounds for silent videos. In each case, by learning to predict data from data, our models end up learning visual representations that are broadly useful. We show that these representations encode meaningful information about the semantics and physics of the world, and can be leveraged to aid a variety of downstream recognition tasks.