Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem

Part-of-Speech Tagging (POST) is a complex task in the preprocessing of Natural Language Processing applications. Tagging has been tackled from statistical information and rule-based approaches, making use of a range of methods. Most recently, metaheuristic algorithms have gained attention while bei...

Full description

Autores:
Tipo de recurso:
Fecha de publicación:
2020
Institución:
Universidad Pedagógica y Tecnológica de Colombia
Repositorio:
RiUPTC: Repositorio Institucional UPTC
Idioma:
eng
spa
OAI Identifier:
oai:repositorio.uptc.edu.co:001/14291
Acceso en línea:
https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762
https://repositorio.uptc.edu.co/handle/001/14291
Palabra clave:
computational intelligence
computational linguistics
evolutionary computing
heuristic algorithms
natural language processing
parts of speech tagging
search methods
algoritmos heurísticos
computación evolutiva
etiquetado de partes del discurso
inteligencia computacional
lingüística computacional
métodos de búsqueda
procesamiento de lenguaje natural
Rights
License
http://purl.org/coar/access_right/c_abf151
id REPOUPTC2_69d2c2f3ee47c3f6811a45b77eef6f51
oai_identifier_str oai:repositorio.uptc.edu.co:001/14291
network_acronym_str REPOUPTC2
network_name_str RiUPTC: Repositorio Institucional UPTC
repository_id_str
dc.title.en-US.fl_str_mv Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem
dc.title.es-ES.fl_str_mv Adaptación, comparación y mejora de algoritmos metaheurísticos al problema de etiquetado de partes del discurso
title Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem
spellingShingle Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem
computational intelligence
computational linguistics
evolutionary computing
heuristic algorithms
natural language processing
parts of speech tagging
search methods
algoritmos heurísticos
computación evolutiva
etiquetado de partes del discurso
inteligencia computacional
lingüística computacional
métodos de búsqueda
procesamiento de lenguaje natural
title_short Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem
title_full Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem
title_fullStr Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem
title_full_unstemmed Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem
title_sort Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem
dc.subject.en-US.fl_str_mv computational intelligence
computational linguistics
evolutionary computing
heuristic algorithms
natural language processing
parts of speech tagging
search methods
topic computational intelligence
computational linguistics
evolutionary computing
heuristic algorithms
natural language processing
parts of speech tagging
search methods
algoritmos heurísticos
computación evolutiva
etiquetado de partes del discurso
inteligencia computacional
lingüística computacional
métodos de búsqueda
procesamiento de lenguaje natural
dc.subject.es-ES.fl_str_mv algoritmos heurísticos
computación evolutiva
etiquetado de partes del discurso
inteligencia computacional
lingüística computacional
métodos de búsqueda
procesamiento de lenguaje natural
description Part-of-Speech Tagging (POST) is a complex task in the preprocessing of Natural Language Processing applications. Tagging has been tackled from statistical information and rule-based approaches, making use of a range of methods. Most recently, metaheuristic algorithms have gained attention while being used in a wide variety of knowledge areas, with good results. As a result, they were deployed in this research in a POST problem to assign the best sequence of tags (roles) for the words of a sentence based on information statistics. This process was carried out in two cycles, each of them comprised four phases, allowing the adaptation to the tagging problem in metaheuristic algorithms such as Particle Swarm Optimization, Jaya, Random-Restart Hill Climbing, and a memetic algorithm based on Global-Best Harmony Search as a global optimizer, and on Hill Climbing as a local optimizer. In the consolidation of each algorithm, preliminary experiments were carried out (using cross-validation) to adjust the parameters of each algorithm and, thus, evaluate them on the datasets of the complete tagged corpus: IULA (Spanish), Brown (English) and Nasa Yuwe (Nasa). The results obtained by the proposed taggers were compared, and the Friedman and Wilcoxon statistical tests were applied, confirming that the proposed memetic, GBHS Tagger, obtained better results in precision. The proposed taggers make an important contribution to POST for traditional languages (English and Spanish), non-traditional languages (Nasa Yuwe), and their application areas.
publishDate 2020
dc.date.accessioned.none.fl_str_mv 2024-07-05T19:11:56Z
dc.date.available.none.fl_str_mv 2024-07-05T19:11:56Z
dc.date.none.fl_str_mv 2020-09-18
dc.type.none.fl_str_mv info:eu-repo/semantics/article
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.coarversion.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.version.spa.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.coarversion.spa.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a234
status_str publishedVersion
dc.identifier.none.fl_str_mv https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762
10.19053/01211129.v29.n54.2020.11762
dc.identifier.uri.none.fl_str_mv https://repositorio.uptc.edu.co/handle/001/14291
url https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762
https://repositorio.uptc.edu.co/handle/001/14291
identifier_str_mv 10.19053/01211129.v29.n54.2020.11762
dc.language.none.fl_str_mv eng
spa
dc.language.iso.spa.fl_str_mv eng
spa
language eng
spa
dc.relation.none.fl_str_mv https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762/9627
https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762/9660
https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762/10015
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.rights.coar.spa.fl_str_mv http://purl.org/coar/access_right/c_abf151
rights_invalid_str_mv http://purl.org/coar/access_right/c_abf151
http://purl.org/coar/access_right/c_abf2
dc.format.none.fl_str_mv application/pdf
application/pdf
application/xml
dc.publisher.en-US.fl_str_mv Universidad Pedagógica y Tecnológica de Colombia
dc.source.en-US.fl_str_mv Revista Facultad de Ingeniería; Vol. 29 No. 54 (2020): Continuos Publication; e11762
dc.source.es-ES.fl_str_mv Revista Facultad de Ingeniería; Vol. 29 Núm. 54 (2020): Publicación Continua; e11762
dc.source.none.fl_str_mv 2357-5328
0121-1129
institution Universidad Pedagógica y Tecnológica de Colombia
repository.name.fl_str_mv Repositorio Institucional UPTC
repository.mail.fl_str_mv repositorio.uptc@uptc.edu.co
_version_ 1839633823757762560
spelling 2020-09-182024-07-05T19:11:56Z2024-07-05T19:11:56Zhttps://revistas.uptc.edu.co/index.php/ingenieria/article/view/1176210.19053/01211129.v29.n54.2020.11762https://repositorio.uptc.edu.co/handle/001/14291Part-of-Speech Tagging (POST) is a complex task in the preprocessing of Natural Language Processing applications. Tagging has been tackled from statistical information and rule-based approaches, making use of a range of methods. Most recently, metaheuristic algorithms have gained attention while being used in a wide variety of knowledge areas, with good results. As a result, they were deployed in this research in a POST problem to assign the best sequence of tags (roles) for the words of a sentence based on information statistics. This process was carried out in two cycles, each of them comprised four phases, allowing the adaptation to the tagging problem in metaheuristic algorithms such as Particle Swarm Optimization, Jaya, Random-Restart Hill Climbing, and a memetic algorithm based on Global-Best Harmony Search as a global optimizer, and on Hill Climbing as a local optimizer. In the consolidation of each algorithm, preliminary experiments were carried out (using cross-validation) to adjust the parameters of each algorithm and, thus, evaluate them on the datasets of the complete tagged corpus: IULA (Spanish), Brown (English) and Nasa Yuwe (Nasa). The results obtained by the proposed taggers were compared, and the Friedman and Wilcoxon statistical tests were applied, confirming that the proposed memetic, GBHS Tagger, obtained better results in precision. The proposed taggers make an important contribution to POST for traditional languages (English and Spanish), non-traditional languages (Nasa Yuwe), and their application areas.La identificación de partes del discurso (Part-of-Speech Tagging, POST) es una tarea compleja en las aplicaciones de procesamiento de lenguaje natural. Ha sido abordada desde enfoques basados en información estadística y reglas, haciendo uso de distintos métodos y, últimamente, se destacan los algoritmos metaheurísticos obteniendo buenos resultados. Por ello, se involucran en esta investigación para asignar la mejor secuencia de etiquetas (roles) para las palabras de una oración, basándose en información estadística. Este proceso se desarrolló en 2 ciclos, donde cada ciclo tuvo 4 fases para la adaptación al problema de etiquetado en los algoritmos metaheurísticos Particle Swarm Optimization, Jaya, Random-Restart Hill Climbing, y un algoritmo memético basado en Global-Best Harmony Search como optimizador global, y en Hill Climbing como optimizador local. Se realizaron experimentos preliminares (utilizando validación cruzada), para ajustar los parámetros de cada algoritmo y luego ejecutarlos sobre los datasets completos de los corpus etiquetados IULA (castellano), Brown (inglés) y Nasa Yuwe (Nasa). Los resultados obtenidos por los etiquetadores propuestos se compararon mediante las pruebas estadísticas no paramétricas de Friedman y Wilcoxon, ratificando que el memético propuesto, GBHS Tagger, obtiene mejores resultados de precisión. Los etiquetadores propuestos se convierten en un aporte muy importante para el POST, tanto para lenguas tradicionales (Inglés y Castellano), no tradicionales (Nasa Yuwe), y sus áreas de aplicación.application/pdfapplication/pdfapplication/xmlengspaengspaUniversidad Pedagógica y Tecnológica de Colombiahttps://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762/9627https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762/9660https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762/10015Copyright (c) 2020 Miguel Alexis Solano-Jiménez, Jose Julio Tobar-Cifuentes, Luz Marina Sierra-Martínez, Ph. D., Carlos Alberto Cobos-Lozada, Ph. D.http://purl.org/coar/access_right/c_abf151http://purl.org/coar/access_right/c_abf2Revista Facultad de Ingeniería; Vol. 29 No. 54 (2020): Continuos Publication; e11762Revista Facultad de Ingeniería; Vol. 29 Núm. 54 (2020): Publicación Continua; e117622357-53280121-1129computational intelligencecomputational linguisticsevolutionary computingheuristic algorithmsnatural language processingparts of speech taggingsearch methodsalgoritmos heurísticoscomputación evolutivaetiquetado de partes del discursointeligencia computacionallingüística computacionalmétodos de búsquedaprocesamiento de lenguaje naturalAdaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging ProblemAdaptación, comparación y mejora de algoritmos metaheurísticos al problema de etiquetado de partes del discursoinfo:eu-repo/semantics/articlehttp://purl.org/coar/resource_type/c_2df8fbb1info:eu-repo/semantics/publishedVersionhttp://purl.org/coar/version/c_970fb48d4fbd8a234http://purl.org/coar/version/c_970fb48d4fbd8a85Solano-Jiménez, Miguel AlexisTobar-Cifuentes, Jose JulioSierra-Martínez, Luz MarinaCobos-Lozada, Carlos Alberto001/14291oai:repositorio.uptc.edu.co:001/142912025-07-18 11:53:37.502metadata.onlyhttps://repositorio.uptc.edu.coRepositorio Institucional UPTCrepositorio.uptc@uptc.edu.co