Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models

A research paper that provides a preliminary comparison of Gemini and GPT-4V across vision-language tasks, useful for understanding model strengths.

Open repository
Updated
2026-06-07

Summary

A research paper that provides a preliminary comparison of Gemini and GPT-4V across vision-language tasks, useful for understanding model strengths.

Found in a curated awesome list under 'Research Papers' section.. Covers performance across Vision-Language Capability, Interaction with Humans, and Temporal Understanding.. arxiv.org is a standard repository for academic research.

Tags

Also appears in