Herramienta para reducir automáticamente la duración de un discurso en ingles adaptada a las características de voz de un hablante
The development of the tool was divided into three phases: manual labeling of important audio segments, extraction of audio parameters and system training. In the phase of labeling a web application was implemented in order to speed up the process. Feature extraction was performed with MIRTOOLBOX li...
- Autores:
-
Alarcón Pedroza, Lebis Armando
Gutiérrez Erazo, José Luis
- Tipo de recurso:
- Fecha de publicación:
- 2015
- Institución:
- Universidad de San Buenaventura
- Repositorio:
- Repositorio USB
- Idioma:
- spa
- OAI Identifier:
- oai:bibliotecadigital.usb.edu.co:10819/3106
- Acceso en línea:
- http://hdl.handle.net/10819/3106
- Palabra clave:
- Señales digitales
Procesamiento del habla
Pattern recognition
Redes neurales (Computadores)
Aprendizaje automático (Inteligencia artíficial)
Habla
Audio digital
- Rights
- License
- http://purl.org/coar/access_right/c_abf2
Summary: | The development of the tool was divided into three phases: manual labeling of important audio segments, extraction of audio parameters and system training. In the phase of labeling a web application was implemented in order to speed up the process. Feature extraction was performed with MIRTOOLBOX library, and the implementation of classifiers and interface was performed using MATLAB. Five classifiers were compared: Linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), Logistic Regression, Artificial neural networks (ANNs) and support vector machines (SVMs), where the best accuracy results were obtained with ANNs: 79.19% and SVMs: 81.21%. Tests were performed to measure the reduction percentage with three new audio. These tests showed an average reduction of 27.34% using ANNs and 24.50% using SVMs. In addition comprehension tests were performed using a reduced audio created by the tool. A 16.67% of information loss was found. It was concluded that the prosodic and spectral parameters provide sufficient data for a classification of relative importance. It was also found that mixing the prosodic and spectral parameters in the same data set provides better accuracy. |
---|