Paper Accepted at ICCV 2025 Workshop 🎉
Title: Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition
Authors: Md Redwan Karim Sony, Parisa Farmanifard, Arun Ross, Anil K. Jain
In this work, we address the following key question: How do generic foundation models (e.g., CLIP, BLIP, GPT-4o, Grok-4) compare against domain-specific face recognition models (e.g., AdaFace, ArcFace) on the face recognition task?
Through extensive experiments on multiple benchmark datasets, we report:
- Domain-specific models consistently outperform zero-shot foundation models across face recognition benchmarks.
- Foundation models benefit from contextual information, performing better on over-segmented faces compared to tightly cropped ones.
- A simple score-level fusion of foundation and domain-specific models improves performance at low false match rates.
- Foundation models such as GPT-4o and Grok-4 provide explainability to the face recognition pipeline and, in some cases, resolve low-confidence errors made by AdaFace.
This work underscores the importance of judiciously combining domain-specific face recognition models with foundation models for improved accuracy and interpretability.