Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries

This article proposes the architecture for a system that uses previously learned weights to sort query results from unstructured data bases when building specialized dictionaries. A common resource in the construction of dictionaries, unstructured data bases have been especially useful in providing...

Full description

Autores:
Tipo de recurso:
Fecha de publicación:
2014
Institución:
Universidad Pedagógica y Tecnológica de Colombia
Repositorio:
RiUPTC: Repositorio Institucional UPTC
Idioma:
eng
OAI Identifier:
oai:repositorio.uptc.edu.co:001/14109
Acceso en línea:
https://revistas.uptc.edu.co/index.php/ingenieria/article/view/3161
https://repositorio.uptc.edu.co/handle/001/14109
Palabra clave:
unstructured data bases
supervised rescoring
specialized lexicography
dictionary making
bases de datos no estructuradas
listas de hipótesis supervisadas
lexicografía especializada
construcción de diccionarios
Rights
License
http://purl.org/coar/access_right/c_abf321
id REPOUPTC2_7646d304d40b52d87968f0c578d3b613
oai_identifier_str oai:repositorio.uptc.edu.co:001/14109
network_acronym_str REPOUPTC2
network_name_str RiUPTC: Repositorio Institucional UPTC
repository_id_str
spelling 2014-12-282024-07-05T19:11:19Z2024-07-05T19:11:19Zhttps://revistas.uptc.edu.co/index.php/ingenieria/article/view/316110.19053/01211129.3161https://repositorio.uptc.edu.co/handle/001/14109This article proposes the architecture for a system that uses previously learned weights to sort query results from unstructured data bases when building specialized dictionaries. A common resource in the construction of dictionaries, unstructured data bases have been especially useful in providing information about lexical items frequencies and examples in use. However, when building specialized dictionaries, whose selection of lexical items does not rely on frequency, the use of these data bases gets restricted to a simple provider of examples. Even in this task, the information unstructured data bases provide may not be very useful when looking for specialized uses of lexical items with various meanings and very long lists of results. In the face of this problem, long lists of hits can be rescored based on a supervised learning model that relies on previously helpful results. The allocation of a vast set of high quality training data for this rescoring system is reported here. Finally, the architecture of sucha system, an unprecedented tool in specialized lexicography, is proposed.El artículo propone la arquitectura de un sistema que usa valores previamente aprendidos para reordenar resultados de búsquedas en bases de datos no estructuradas al construir diccionarios especializados. Un recurso común en la construcción de diccionarios, las bases de datos no estructuradas han sido útiles ya que proveen información sobre unidades léxicas, tal como la frecuencia o ejemplos de uso de las mismas. Sin embargo, en la construcción de diccionarios especializados, cuya selección de elementos léxicos no depende de la frecuencia, el uso de estas bases de datos queda restringido a la simple ejemplificación. Incluso en esta tarea, la información de las bases de datos no estructuradas puede no ser muy útil si se buscan unidades léxicas con un uso especializado pero con varios otros significados que producen largas listas de resultados. Ante este problema, estas listas pueden ser ponderadas usando un modelo de aprendizaje automático supervisado que se apoye de los resultados previamente útiles. La recolección de un vasto conjunto de datos de alta calidad para este sistema de ponderación es reportada aquí. Finalmente, se propone la arquitectura de tal sistema, el cual representa una herramienta sin precedentes en la lexicografía especializada.application/pdftext/htmlengengUniversidad Pedagógica y Tecnológica de Colombiahttps://revistas.uptc.edu.co/index.php/ingenieria/article/view/3161/2853https://revistas.uptc.edu.co/index.php/ingenieria/article/view/3161/4348Revista Facultad de Ingeniería; Vol. 24 No. 38 (2015); 97-106Revista Facultad de Ingeniería; Vol. 24 Núm. 38 (2015); 97-1062357-53280121-1129unstructured data basessupervised rescoringspecialized lexicographydictionary makingbases de datos no estructuradaslistas de hipótesis supervisadaslexicografía especializadaconstrucción de diccionariosTowards a supervised rescoring system for unstructured data bases used to build specialized dictionariesHacia un sistema de ponderación supervisado de bases de datos no estructuradas utilizadas en la construcción de diccionarios especializadosinvestigationinvestigacióninfo:eu-repo/semantics/articlehttp://purl.org/coar/resource_type/c_2df8fbb1info:eu-repo/semantics/publishedVersionhttp://purl.org/coar/version/c_970fb48d4fbd8a404http://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/access_right/c_abf321http://purl.org/coar/access_right/c_abf2Rico-Sulayes, Antonio001/14109oai:repositorio.uptc.edu.co:001/141092025-07-18 11:53:51.181metadata.onlyhttps://repositorio.uptc.edu.coRepositorio Institucional UPTCrepositorio.uptc@uptc.edu.co
dc.title.en-US.fl_str_mv Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries
dc.title.es-ES.fl_str_mv Hacia un sistema de ponderación supervisado de bases de datos no estructuradas utilizadas en la construcción de diccionarios especializados
title Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries
spellingShingle Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries
unstructured data bases
supervised rescoring
specialized lexicography
dictionary making
bases de datos no estructuradas
listas de hipótesis supervisadas
lexicografía especializada
construcción de diccionarios
title_short Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries
title_full Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries
title_fullStr Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries
title_full_unstemmed Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries
title_sort Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries
dc.subject.en-US.fl_str_mv unstructured data bases
supervised rescoring
specialized lexicography
dictionary making
topic unstructured data bases
supervised rescoring
specialized lexicography
dictionary making
bases de datos no estructuradas
listas de hipótesis supervisadas
lexicografía especializada
construcción de diccionarios
dc.subject.es-ES.fl_str_mv bases de datos no estructuradas
listas de hipótesis supervisadas
lexicografía especializada
construcción de diccionarios
description This article proposes the architecture for a system that uses previously learned weights to sort query results from unstructured data bases when building specialized dictionaries. A common resource in the construction of dictionaries, unstructured data bases have been especially useful in providing information about lexical items frequencies and examples in use. However, when building specialized dictionaries, whose selection of lexical items does not rely on frequency, the use of these data bases gets restricted to a simple provider of examples. Even in this task, the information unstructured data bases provide may not be very useful when looking for specialized uses of lexical items with various meanings and very long lists of results. In the face of this problem, long lists of hits can be rescored based on a supervised learning model that relies on previously helpful results. The allocation of a vast set of high quality training data for this rescoring system is reported here. Finally, the architecture of sucha system, an unprecedented tool in specialized lexicography, is proposed.
publishDate 2014
dc.date.accessioned.none.fl_str_mv 2024-07-05T19:11:19Z
dc.date.available.none.fl_str_mv 2024-07-05T19:11:19Z
dc.date.none.fl_str_mv 2014-12-28
dc.type.en-US.fl_str_mv investigation
dc.type.es-ES.fl_str_mv investigación
dc.type.none.fl_str_mv info:eu-repo/semantics/article
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.coarversion.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.version.spa.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.coarversion.spa.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a404
status_str publishedVersion
dc.identifier.none.fl_str_mv https://revistas.uptc.edu.co/index.php/ingenieria/article/view/3161
10.19053/01211129.3161
dc.identifier.uri.none.fl_str_mv https://repositorio.uptc.edu.co/handle/001/14109
url https://revistas.uptc.edu.co/index.php/ingenieria/article/view/3161
https://repositorio.uptc.edu.co/handle/001/14109
identifier_str_mv 10.19053/01211129.3161
dc.language.none.fl_str_mv eng
dc.language.iso.spa.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv https://revistas.uptc.edu.co/index.php/ingenieria/article/view/3161/2853
https://revistas.uptc.edu.co/index.php/ingenieria/article/view/3161/4348
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.rights.coar.spa.fl_str_mv http://purl.org/coar/access_right/c_abf321
rights_invalid_str_mv http://purl.org/coar/access_right/c_abf321
http://purl.org/coar/access_right/c_abf2
dc.format.none.fl_str_mv application/pdf
text/html
dc.publisher.en-US.fl_str_mv Universidad Pedagógica y Tecnológica de Colombia
dc.source.en-US.fl_str_mv Revista Facultad de Ingeniería; Vol. 24 No. 38 (2015); 97-106
dc.source.es-ES.fl_str_mv Revista Facultad de Ingeniería; Vol. 24 Núm. 38 (2015); 97-106
dc.source.none.fl_str_mv 2357-5328
0121-1129
institution Universidad Pedagógica y Tecnológica de Colombia
repository.name.fl_str_mv Repositorio Institucional UPTC
repository.mail.fl_str_mv repositorio.uptc@uptc.edu.co
_version_ 1839633873280958464