Herramienta para reducir automáticamente la duración de un discurso en ingles adaptada a las características de voz de un hablante

The development of the tool was divided into three phases: manual labeling of important audio segments, extraction of audio parameters and system training. In the phase of labeling a web application was implemented in order to speed up the process. Feature extraction was performed with MIRTOOLBOX li...

Full description

Autores:
Alarcón Pedroza, Lebis Armando
Gutiérrez Erazo, José Luis
Tipo de recurso:
Fecha de publicación:
2015
Institución:
Universidad de San Buenaventura
Repositorio:
Repositorio USB
Idioma:
spa
OAI Identifier:
oai:bibliotecadigital.usb.edu.co:10819/3106
Acceso en línea:
http://hdl.handle.net/10819/3106
Palabra clave:
Señales digitales
Procesamiento del habla
Pattern recognition
Redes neurales (Computadores)
Aprendizaje automático (Inteligencia artíficial)
Habla
Audio digital
Rights
License
http://purl.org/coar/access_right/c_abf2
Description
Summary:The development of the tool was divided into three phases: manual labeling of important audio segments, extraction of audio parameters and system training. In the phase of labeling a web application was implemented in order to speed up the process. Feature extraction was performed with MIRTOOLBOX library, and the implementation of classifiers and interface was performed using MATLAB. Five classifiers were compared: Linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), Logistic Regression, Artificial neural networks (ANNs) and support vector machines (SVMs), where the best accuracy results were obtained with ANNs: 79.19% and SVMs: 81.21%. Tests were performed to measure the reduction percentage with three new audio. These tests showed an average reduction of 27.34% using ANNs and 24.50% using SVMs. In addition comprehension tests were performed using a reduced audio created by the tool. A 16.67% of information loss was found. It was concluded that the prosodic and spectral parameters provide sufficient data for a classification of relative importance. It was also found that mixing the prosodic and spectral parameters in the same data set provides better accuracy.