Study of a visual perception system with multiple models for visual disability assistance

The project presents a technological solution to assist people with visual impairments through a mobile application developed with Flutter. This application integrates advanced computer vision models (YOLOv11 and YOLO-World) to detect and recognize objects in real time. Once objects in the environme...

Full description

Autores:
Apresa Echeverría, Rubens André
Heras Gómez, Camilo Andrés
Linero Caro, Jaymed Daniel
Tipo de recurso:
Fecha de publicación:
2025
Institución:
Universidad del Norte
Repositorio:
Repositorio Uninorte
Idioma:
spa
OAI Identifier:
oai:manglar.uninorte.edu.co:10584/13409
Acceso en línea:
http://hdl.handle.net/10584/13409
Palabra clave:
Yolo
Multiples modelos
visual
Multiple Models
Visual Disability Assistance
visual
Rights
License
Universidad del Norte
Description
Summary:The project presents a technological solution to assist people with visual impairments through a mobile application developed with Flutter. This application integrates advanced computer vision models (YOLOv11 and YOLO-World) to detect and recognize objects in real time. Once objects in the environment are identified using the phone's camera, the application generates detailed audio descriptions with the support of Gemini AI, allowing users to better understand their surroundings. The system is based on a hybrid architecture that combines local processing on the device with cloud services, ensuring efficiency and scalability. It has been tested and validated with real users to ensure its practical usability. The use of established datasets such as COCO and LVIS allows the models to recognize a wide variety of everyday objects, improving the richness and relevance of the information provided to users. The development process followed the agile SCRUM methodology, which allowed for iterative improvements in development, accessibility, validation, and user experience. The project demonstrates how the integration of generative artificial intelligence with robust object detection models can contribute to greater independence and mobility for people with visual impairments in diverse real-world environments.