Vision-Language Model Object Counting
Published:

Overview
Vision-language models (VLMs) struggle with counting objects in dense scenes. This research enumerates possible strategies to improve VLM counting performance while simultaneously investigating what VLMs are actually doing when they return a “count” on dense scenes of objects.
Research Questions
- What strategies can improve VLM counting accuracy?
- What internal mechanisms do VLMs use when counting objects?
- How do VLMs represent and process dense scenes during counting tasks?
Status
Currently conducting experiments as part of research with Prof. Jivko Sinapov at Tufts University.
