Data-Efficient Visual Concept Bottleneck Models

Interpretable, Dataset-Specific Visual Concepts with Minimal Data

📄 View Paper 💻 View Code ICML 2025

Katharina Prasse*

University of Mannheim

katharina.prasse@uni-mannheim.de

Patrick Knab*

TU Clausthal

patrick.knab@tu-clausthal.de

Sascha Marton

TU Clausthal

sascha.marton@tu-clausthal.de

Christian Bartelt

TU Clausthal

christian.bartelt@tu-clausthal.de

Margret Keuper

University of Mannheim, Mannheim; Max-Planck-Institute for Informatics

keuper@mpi-inf.mpg.de

* Equal Contribution

DCBM Example Explanations
DCBMs extract image regions as concepts. Using vision foundation models, we use crop image regions as concepts for CBM training. Based on few concept samples (50 imgs / class), DCBMs offer interpretability even for fine-grained classification.

Abstract

Concept Bottleneck Models (CBMs) enhance the interpretability of neural networks by basing predictions on human-understandable concepts. However, current CBMs typically rely on concept sets extracted from large language models or extensive image corpora, limiting their effectiveness in data-sparse scenarios. We propose Data-efficient CBMs (DCBMs), which reduce the need for large sample sizes during concept generation while preserving interpretability. DCBMs define concepts as image regions detected by segmentation or detection foundation models, allowing each image to generate multiple concepts across different granularities. Exclusively containing dataset-specific concepts, DCBMs are well suited for fine-grained classification and out-of-distribution tasks. Attribution analysis using Grad-CAM demonstrates that DCBMs deliver visual concepts that can be localized in test images. By leveraging dataset-specific concepts instead of predefined or general ones, DCBMs enhance adaptability to new domains.

Core Technologies

  • 🔍 Segmentation: SAM, SAM2, GroundingDINO, Mask R-CNN
  • 💡 Backbone: CLIP ViT-L/14, CLIP ViT-B/16, ResNet50
  • 🎯 Tasks: Fine-Grained & OOD Classification

Introduction

Deep neural networks achieve state-of-the-art performance but remain opaque, hindering trust in critical applications. Explainable AI (XAI) seeks transparency; Concept Bottleneck Models predict through human-understandable concepts, weighting these to form decisions. Early CBMs use manually defined concepts, while recent works leverage text-aligned sets from large corpora, which can misalign in fine-grained or low-data regimes. DCBMs define concepts as visual regions extracted by foundation models, generating multi-granularity, dataset-specific concepts robust to domain shifts.

Contributions

Method

  1. Proposal: Sample n images per class; apply SAM2/GroundingDINO for region proposals and filter by area.
  2. Clustering: Encode proposals, cluster embeddings (k-means, k=2048) into visual concepts.
  3. Bottleneck Training: Project image embeddings onto centroids; train sparse linear classifier with L1.
  4. Naming: (Optional) Match centroids to CLIP text embeddings for labels; prune spurious concepts.
DCBM Framework
Figure: DCBM pipeline—Proposal, Clustering, Bottleneck Training, Naming.

Experiments

DCBMs achieve competitive performance while requiring significantly less data for concept generation. Our experiments demonstrate:

Data Efficiency

96%

reduction in concept generation data

Performance Gap

6%

vs. CLIP linear probe

OOD Robustness

22%

error increase on ImageNet-R

Key Findings

  • ✓ Achieve comparable accuracy with only 50 images per class for concept generation
  • ✓ Maintain performance across ImageNet, Places365, CUB, and CIFAR benchmarks
  • ✓ Superior out-of-distribution robustness compared to baseline methods
  • ✓ Dataset-specific concepts improve fine-grained classification accuracy
DCBM Visual Concept Pipeline
Comparison of concept explanations across CBMs. This figure shows concept-based explanations for two images from the Places365 dataset, as generated by four different CBMs: CDM (Panousis et al., 2023), LF-CBM (Oikarinen et al., 2023), DNCBM (Rao et al., 2024), and our proposed DCBM. The examples for CDM, LF-CBM, and DNCBM are adapted from (Rao et al., 2024).

Conclusion

DCBMs offer a visually grounded, data-efficient concept bottleneck approach with minimal samples and strong interpretability. Future directions include extending to regression and refining concept naming.