Victor Boutin

Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models

L. Béthune*, D. Vigouroux, Y. Du, R. VanRullen, T. Serre, V. Boutin*

NeurIPS 2025

Paper

In this work, we propose a method for deriving Riemannian metrics directly from pretrained Energy-Based Models (EBMs)—a class of generative models that assign low energy to high-density regions. These metrics define spatially varying distances, enabling the computation of geodesics—shortest paths that follow the data manifold's intrinsic geometry. Our work is the first to derive Riemannian metrics from EBMs, enabling data-aware geodesics and unlocking scalable, geometry-driven learning for generative modeling and simulation.

Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks

V. Boutin*, R. Mukherji, A. Agrawal, S. Muzellec, T. Fel, T. Serre, R. VanRullen

NeurIPS 2024

Paper Code

Humans can effortlessly draw new categories from a single exemplar, a feat that has long posed a challenge for generative models. Here, we study how different inductive biases shape the latent space of Latent Diffusion Models (LDMs). We demonstrate that LDMs with redundancy reduction and prototype-based regularizations produce near-human-like drawings (regarding both samples' recognizability and originality) —better mimicking human perception (as evaluated psychophysically). Overall, our results suggest that the gap between humans and machines in one-shot drawings is almost closed.

Saliency strikes back: How filtering out high frequencies improves white-box explanations

S. Muzellec*, T. Fel, V. Boutin, L. andéol, R. VanRullen, T. Serre

ICML 2024

Paper Code

Attribution methods explain model decisions by scoring input contributions. We show efficient “white-box” methods use gradients polluted by high-frequency artifacts. FORGrad—a simple Fourier low-pass filter with architecture-specific cutoffs—cleans these gradients. Across models, it consistently boosts white-box faithfulness, making them competitive with costlier black-box approaches while staying lightweight.

Diffusion Models as Artists: Are we Closing the Gap between Humans and Machines?

V. Boutin*, T. Fel, L. Singhal, R. Mukherji, A. Nagaraj, J. Colin, T. Serre

ICML 2023 (Oral)

Paper Code

An important milestone for AI is the development of algorithms that can produce drawings that are indistinguishable from those of humans. Here, we adapt the 'diversity vs. recognizability' scoring framework from Boutin et al, 2022 and find that one-shot diffusion models have indeed started to close the gap between humans and machines. However, comparing human category diagnostic features, collected through an online psychophysics experiment, against those derived from diffusion models reveals that humans rely on fewer and more localized features. Overall, our study suggests that diffusion models have significantly helped improve the quality of machine-generated drawings; however, a gap between humans and machines remains.

A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation

T. Fel*, V. Boutin*, M. Moayeri, R. Cadène, L. Bethune, M. Chalvidal, T. Serre

NeurIPS 2023 (Spotlight)

Paper Website

In this article we demonstrate that all concept extraction methods can be viewed as dictionary learning methods. We leverage this common framework to develop a comprehensive framework for comparing and i mproving concept extraction methods. Furthermore, we extensively investigate the estimation of concept importance and show that it is possible to determine optimal importance estimation formulas in certain cases. We also highlight the significance of local concept importance in addressing a crucial question in Explainable Artificial Intelligence (XAI): identifying data points classified based on similar reasons.

Unlocking Feature Visualization for Deeper Networks with Magnitude Constrained Optimization

T. Fel*, T. Boissin*, V. Boutin*, A. Picard*, P. Novello*, J. Colin, D. Linsley, T. Rousseau, R. Cadène, L. Gardes, T. Serre

NeurIPS 2023

Paper Website Code

Since the remarkable work of Chris Olah and the Clarity team at OpenAI, feature visualization techniques have stagnated since 2017, and the methods proposed at the time are very difficult to make work on modern models (e.g. Vision Transformers). In this article, we propose a simple technique to revive feature visualization on modern models. Our method is based on a magnitude constraint, which ensures that the generated images have a magnitude similar to real images while avoiding the need for managing an additional hyperparameter.

Diversity vs. Recognizability: Human-like generalization in one-shot generative models

V. Boutin*, L. Singhal, X. Thomas, T. Serre

NeurIPS 2022

Paper Code

Here, we propose a new framework to evaluate one-shot generative models along two axes: sample recognizability vs. diversity (ie, intra-class variability). Using this framework, we perform a systematic evaluation of representative one-shot generative models on the Omniglot handwritten dataset. We show that GAN-like and VAE-like models fall on opposite ends of the diversity-recognizability space. Using the diversity-recognizability framework, we were able to identify models and parameters that closely approximate human data.

Pooling strategies in V1 can account for the functional and structural diversity across species

V. Boutin*, A. Franciosini, F. Chavane, L. Perrinet

Plos Computational Biology (2022)

Paper

V1 neurons are orientation-selective with varying phase selectivity (simple → complex). Prior models tie phase invariance to orientation maps in higher mammals but can’t explain complex cells in species without maps. Using a convolutional Sparse Deep Predictive Coding (SDPC) model, we show a single mechanism—pooling—accounts for both: pooling in feature space drives orientation map formation, while pooling in retinotopic space yields complex-cell invariance. SDPC thus explains complex cells with or without orientation maps and offers a unified account of V1’s structural and functional diversity.

Sparse Deep Predictive Coding captures contour integration capabilities of the early visual cortex

V. Boutin*, A. Franciosini, F. Chavane, F. Ruffier, L. Perrinet

Plos Computational Biology (2021)

Paper Code

Recurrent/feedback connections shape context in early vision, but most models split neural vs. representational effects. Sparse Deep Predictive Coding (SDPC) unifies them: Sparse Coding handles intralayer recurrence; Predictive Coding mediates interlayer feedforward/feedback in a hierarchical convnet. Trained as a 2-layer V1/V2 proxy, SDPC learns V1-like oriented RFs and more complex V2 features; feedback reorganizes V1 interaction maps (association fields/“good continuation”), promoting contour integration. The same feedback boosts robustness to noise/blur and improves reconstructions—linking neural- and representation-level feedback in one model.

Iterative VAE as a predictive brain model for out-of-distribution generalization

V. Boutin*, A. Zerroug, M. Jung, T. Serre

Workshops Shared Visual Representations in Human and Machine Intelligence at NeurIPS 2020

Paper Code

Primate vision generalizes to novel degradations. We link predictive coding networks (PCNs) to variational autoencoders (VAEs), deriving a formal correspondence. This motivates iterative VAEs (iVAEs) as a variational counterpart to PCNs. iVAEs show markedly better OOD generalization than PCNs and standard VAEs. We also introduce a per-sample recognizability metric testable via psychophysics, positioning iVAEs as a promising neuroscience model.

Effect of top-down connections in Hierarchical Sparse Coding

V. Boutin*, A. Franciosini, F. Ruffier, L. Perrinet

Neural Computation (2020)

Paper Code

Hierarchical sparse coding (HSC) is often solved layer-wise, but neuroscience suggests adding top-down feedback (predictive coding). We introduce a two-layer sparse predictive coding model (2L-SPC) and compare it to a two-layer hierarchical Lasso (Hi-La). Across four datasets, 2L-SPC transfers error between layers, yielding lower prediction error, faster inference, and better second-layer representations. It also speeds learning and discovers more generic, larger-extent features.

Main research articles