Vision-Language Model Object Counting

Published: November 01, 2024

VLM Counting Evolution

Overview

Vision-language models (VLMs) struggle with counting objects in dense scenes. This research enumerates possible strategies to improve VLM counting performance while simultaneously investigating what VLMs are actually doing when they return a “count” on dense scenes of objects.

Research Questions

What strategies can improve VLM counting accuracy?
What internal mechanisms do VLMs use when counting objects?
How do VLMs represent and process dense scenes during counting tasks?

Status

Currently conducting experiments as part of research with Prof. Jivko Sinapov at Tufts University.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Denny Loevlie

Overview

Research Questions

Status

Share on