ViQAgent: zero-shot video question answering via agent with open-vocabulary grounding validation

Recent advancements in Video Question Answering (VideoQA) have introduced LLM-based agents, modular frameworks, and procedural solutions, yielding promising results. These systems use dynamic agents and memory-based mechanisms to break down complex tasks and refine answers. However, significant impr...

Full description

Autores:
Montes Buitrago, Tony Santiago
Tipo de recurso:
Trabajo de grado de pregrado
Fecha de publicación:
2024
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
eng
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/75454
Acceso en línea:
https://hdl.handle.net/1992/75454
Palabra clave:
Video question-answering
Video grounding
Multimodal
Large language model
Chain-of-thought
Vision-language models
Open-vocabulary
Ingeniería
Rights
embargoedAccess
License
Attribution 4.0 International