Desarrollo de un sistema de transcripción y contextualización automática para la generación de guías de estudio

Higher education faces persistent challenges in ensuring student accessibility and comprehension of content in theory-heavy courses. This project details the development of a web application designed to automatically transcribe class audio and contextualize the information to generate comprehensive...

Full description

Autores:
Benítez Avilez, Felipe José
Gómez Rosales, Laura Sofía
Valencia Gómez, Fernando Mateo
Tipo de recurso:
Fecha de publicación:
2025
Institución:
Universidad del Norte
Repositorio:
Repositorio Uninorte
Idioma:
spa
OAI Identifier:
oai:manglar.uninorte.edu.co:10584/13381
Acceso en línea:
http://hdl.handle.net/10584/13381
Palabra clave:
Automatic Speech Recognition (ASR), Natural Language Processing (NLP), Educational Technology, Higher Education, Contextualization, Fine-tuning, Study Guides, Latin American Spanish, Vector Search, Markdow
Reconocimiento automático del habla (ASR), Procesamiento del lenguaje natural (PLN), Tecnología educativa, Educación superior, Contextualización, Ajuste, Guías de estudio, Español latinoamericano, Búsqueda vectorial, Markdow
Rights
License
Universidad del Norte
Description
Summary:Higher education faces persistent challenges in ensuring student accessibility and comprehension of content in theory-heavy courses. This project details the development of a web application designed to automatically transcribe class audio and contextualize the information to generate comprehensive study guides, thereby enhancing student learning across various demanding disciplines. The system leverages a Vosk (Kaldi) Automatic Speech Recognition (ASR) model, fine-tuned for Latin American Spanish and academic discourse, to produce accurate transcriptions. These transcriptions are subsequently enriched by integrating student-taken notes and open-access bibliographic resources. The core output is the automatic generation of structured, referenced study guides, exportable in Markdown format. Key technologies employed include Python, ChromaDB for vectorial data management, and JavaScript for the web interface. This initiative aims to provide an open-source, adaptable solution to improve understanding and academic performance in subjects with high conceptual density.