Uncertainty clustering internal validity assessment using Fréchet distance for unsupervised learning

ABSTRACT: Knowing the number of clusters a priori is one of the most challenging aspects of unsupervised learning. Clustering Internal Validity Indices (CIVIs) evaluate partitions in unsupervised algorithms based on metrics like compactness, separation, and density. However, specialized CIVIs for sp...

Full description

Autores:
Rendón Hurtado, Nestor David
Ramírez García, Edison
Isaza Narváez, Claudia Victoria
Giraldo Zuluaga, Jhony Heriberto
Bouwmans, Thierry
Rodríguez Buriticá, Susana
Tipo de recurso:
Article of investigation
Fecha de publicación:
2023
Institución:
Universidad de Antioquia
Repositorio:
Repositorio UdeA
Idioma:
eng
OAI Identifier:
oai:bibliotecadigital.udea.edu.co:10495/36087
Acceso en línea:
https://hdl.handle.net/10495/36087
Palabra clave:
Unsupervised learning
Clustering validity
Fréchet distance
Type-2 fuzzy sets
Rights
openAccess
License
http://creativecommons.org/licenses/by-nc-nd/2.5/co/
id UDEA2_f56388e56a765daced4a436dc3f86568
oai_identifier_str oai:bibliotecadigital.udea.edu.co:10495/36087
network_acronym_str UDEA2
network_name_str Repositorio UdeA
repository_id_str
dc.title.spa.fl_str_mv Uncertainty clustering internal validity assessment using Fréchet distance for unsupervised learning
title Uncertainty clustering internal validity assessment using Fréchet distance for unsupervised learning
spellingShingle Uncertainty clustering internal validity assessment using Fréchet distance for unsupervised learning
Unsupervised learning
Clustering validity
Fréchet distance
Type-2 fuzzy sets
title_short Uncertainty clustering internal validity assessment using Fréchet distance for unsupervised learning
title_full Uncertainty clustering internal validity assessment using Fréchet distance for unsupervised learning
title_fullStr Uncertainty clustering internal validity assessment using Fréchet distance for unsupervised learning
title_full_unstemmed Uncertainty clustering internal validity assessment using Fréchet distance for unsupervised learning
title_sort Uncertainty clustering internal validity assessment using Fréchet distance for unsupervised learning
dc.creator.fl_str_mv Rendón Hurtado, Nestor David
Ramírez García, Edison
Isaza Narváez, Claudia Victoria
Giraldo Zuluaga, Jhony Heriberto
Bouwmans, Thierry
Rodríguez Buriticá, Susana
dc.contributor.author.none.fl_str_mv Rendón Hurtado, Nestor David
Ramírez García, Edison
Isaza Narváez, Claudia Victoria
Giraldo Zuluaga, Jhony Heriberto
Bouwmans, Thierry
Rodríguez Buriticá, Susana
dc.contributor.researchgroup.spa.fl_str_mv Sistemas Embebidos e Inteligencia Computacional (SISTEMIC)
dc.subject.proposal.spa.fl_str_mv Unsupervised learning
Clustering validity
Fréchet distance
Type-2 fuzzy sets
topic Unsupervised learning
Clustering validity
Fréchet distance
Type-2 fuzzy sets
description ABSTRACT: Knowing the number of clusters a priori is one of the most challenging aspects of unsupervised learning. Clustering Internal Validity Indices (CIVIs) evaluate partitions in unsupervised algorithms based on metrics like compactness, separation, and density. However, specialized CIVIs for specific applications have been designed, and there is no general CIVI that works in all scenarios. The absence of CIVIs based on crisp uncertainty metrics is especially critical in decision-making processes that involve ambiguity, non-convex distributions, outliers, and overlapping data. To address this problem, we propose a novel Uncertainty Fréchet (UF) CIVI that assesses the certainty of a well-defined partition. UF leverages uncertainty fingerprints based on Type-2 fuzzy Gaussian Mixture Models (T2FGMM) and the Fréchet distance between clusters to introduce a metric that evaluates partition quality. We integrate UF into a merging methodology that combines similar clusters within a partition, allowing us to determine the number of clusters without the need to run the clustering algorithms iteratively as other CIVIs require. We undertake a comprehensive evaluation of our proposal on 5,250 convex, 36 non-convex synthetic datasets, and five benchmark real datasets. In addition, we apply UF in a real-world scenario that involves high uncertainty: Passive Acoustic Monitoring (PAM) of ecosystems, which aims to study ecological transformations through acoustic recordings. The results show that UF exhibits notable performance in synthetic and real-world scenarios, obtaining an Adjusted Mutual Information (AMI) score higher than 0.88 for normal, uniform, gamma, and triangular distribution datasets. In the PAM application, UF identifies the transformation of ecosystems through sound using clustering algorithms and UF, achieving an F1 score of 0.84. Therefore, results show that the UF index is a suitable tool for researchers and practitioners working with highly uncertain data.
publishDate 2023
dc.date.accessioned.none.fl_str_mv 2023-07-31T16:56:47Z
dc.date.available.none.fl_str_mv 2023-07-31T16:56:47Z
dc.date.issued.none.fl_str_mv 2023
dc.type.spa.fl_str_mv Artículo de investigación
dc.type.coar.spa.fl_str_mv http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.redcol.spa.fl_str_mv https://purl.org/redcol/resource_type/ART
dc.type.coarversion.spa.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.driver.spa.fl_str_mv info:eu-repo/semantics/article
dc.type.version.spa.fl_str_mv info:eu-repo/semantics/publishedVersion
format http://purl.org/coar/resource_type/c_2df8fbb1
status_str publishedVersion
dc.identifier.issn.none.fl_str_mv 0952-1976
dc.identifier.uri.none.fl_str_mv https://hdl.handle.net/10495/36087
dc.identifier.doi.none.fl_str_mv 10.1016/j.engappai.2023.106635
dc.identifier.eissn.none.fl_str_mv 1873-6769
identifier_str_mv 0952-1976
10.1016/j.engappai.2023.106635
1873-6769
url https://hdl.handle.net/10495/36087
dc.language.iso.spa.fl_str_mv eng
language eng
dc.relation.ispartofjournalabbrev.spa.fl_str_mv Eng. Appl. Artif. Intell.
dc.relation.citationendpage.spa.fl_str_mv 14
dc.relation.citationstartpage.spa.fl_str_mv 1
dc.relation.citationvolume.spa.fl_str_mv 124
dc.relation.ispartofjournal.spa.fl_str_mv Engineering Applications of Artificial Intelligence
dc.rights.uri.*.fl_str_mv http://creativecommons.org/licenses/by-nc-nd/2.5/co/
dc.rights.uri.spa.fl_str_mv https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights.accessrights.spa.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.coar.spa.fl_str_mv http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-nd/2.5/co/
https://creativecommons.org/licenses/by-nc-nd/4.0/
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.extent.spa.fl_str_mv 14
dc.format.mimetype.spa.fl_str_mv application/pdf
dc.publisher.spa.fl_str_mv Elsevier
dc.publisher.place.spa.fl_str_mv Swansea, Reino Unido
institution Universidad de Antioquia
bitstream.url.fl_str_mv https://bibliotecadigital.udea.edu.co/bitstreams/86d1334d-84f6-4fa1-a7ee-44a71b6b7957/download
https://bibliotecadigital.udea.edu.co/bitstreams/e0ab8540-9ab8-4d43-8c6b-0f1fea794875/download
https://bibliotecadigital.udea.edu.co/bitstreams/8b4f0bde-bde1-45ea-9338-6abde3d7fc54/download
https://bibliotecadigital.udea.edu.co/bitstreams/24b8952c-bbd2-4afb-ab4a-43452e7530bf/download
https://bibliotecadigital.udea.edu.co/bitstreams/260066d0-8708-4e35-8e42-ca846d477de7/download
bitstream.checksum.fl_str_mv 610419c4209344e30e84a82c1e7d02c9
b88b088d9957e670ce3b3fbe2eedbc13
8a4605be74aa9ea9d79846c1fba20a33
c79087ac6c0e3dae426eac7d3cdbf637
dd7c609e939547a289f8d40ab1205a11
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio Institucional de la Universidad de Antioquia
repository.mail.fl_str_mv aplicacionbibliotecadigitalbiblioteca@udea.edu.co
_version_ 1851052431379529728
spelling Rendón Hurtado, Nestor DavidRamírez García, EdisonIsaza Narváez, Claudia VictoriaGiraldo Zuluaga, Jhony HeribertoBouwmans, ThierryRodríguez Buriticá, SusanaSistemas Embebidos e Inteligencia Computacional (SISTEMIC)2023-07-31T16:56:47Z2023-07-31T16:56:47Z20230952-1976https://hdl.handle.net/10495/3608710.1016/j.engappai.2023.1066351873-6769ABSTRACT: Knowing the number of clusters a priori is one of the most challenging aspects of unsupervised learning. Clustering Internal Validity Indices (CIVIs) evaluate partitions in unsupervised algorithms based on metrics like compactness, separation, and density. However, specialized CIVIs for specific applications have been designed, and there is no general CIVI that works in all scenarios. The absence of CIVIs based on crisp uncertainty metrics is especially critical in decision-making processes that involve ambiguity, non-convex distributions, outliers, and overlapping data. To address this problem, we propose a novel Uncertainty Fréchet (UF) CIVI that assesses the certainty of a well-defined partition. UF leverages uncertainty fingerprints based on Type-2 fuzzy Gaussian Mixture Models (T2FGMM) and the Fréchet distance between clusters to introduce a metric that evaluates partition quality. We integrate UF into a merging methodology that combines similar clusters within a partition, allowing us to determine the number of clusters without the need to run the clustering algorithms iteratively as other CIVIs require. We undertake a comprehensive evaluation of our proposal on 5,250 convex, 36 non-convex synthetic datasets, and five benchmark real datasets. In addition, we apply UF in a real-world scenario that involves high uncertainty: Passive Acoustic Monitoring (PAM) of ecosystems, which aims to study ecological transformations through acoustic recordings. The results show that UF exhibits notable performance in synthetic and real-world scenarios, obtaining an Adjusted Mutual Information (AMI) score higher than 0.88 for normal, uniform, gamma, and triangular distribution datasets. In the PAM application, UF identifies the transformation of ecosystems through sound using clustering algorithms and UF, achieving an F1 score of 0.84. Therefore, results show that the UF index is a suitable tool for researchers and practitioners working with highly uncertain data.COL001071714application/pdfengElsevierSwansea, Reino Unidohttp://creativecommons.org/licenses/by-nc-nd/2.5/co/https://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Uncertainty clustering internal validity assessment using Fréchet distance for unsupervised learningArtículo de investigaciónhttp://purl.org/coar/resource_type/c_2df8fbb1https://purl.org/redcol/resource_type/ARThttp://purl.org/coar/version/c_970fb48d4fbd8a85info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionUnsupervised learningClustering validityFréchet distanceType-2 fuzzy setsEng. Appl. Artif. Intell.141124Engineering Applications of Artificial IntelligencePublicationORIGINALRendonNestor_2023_FrechetDistanceUnsupervisedLearning.pdfRendonNestor_2023_FrechetDistanceUnsupervisedLearning.pdfArtículo de investigaciónapplication/pdf1486797https://bibliotecadigital.udea.edu.co/bitstreams/86d1334d-84f6-4fa1-a7ee-44a71b6b7957/download610419c4209344e30e84a82c1e7d02c9MD51trueAnonymousREADCC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8823https://bibliotecadigital.udea.edu.co/bitstreams/e0ab8540-9ab8-4d43-8c6b-0f1fea794875/downloadb88b088d9957e670ce3b3fbe2eedbc13MD52falseAnonymousREADLICENSElicense.txtlicense.txttext/plain; charset=utf-81748https://bibliotecadigital.udea.edu.co/bitstreams/8b4f0bde-bde1-45ea-9338-6abde3d7fc54/download8a4605be74aa9ea9d79846c1fba20a33MD53falseAnonymousREADTEXTRendonNestor_2023_FrechetDistanceUnsupervisedLearning.pdf.txtRendonNestor_2023_FrechetDistanceUnsupervisedLearning.pdf.txtExtracted texttext/plain101235https://bibliotecadigital.udea.edu.co/bitstreams/24b8952c-bbd2-4afb-ab4a-43452e7530bf/downloadc79087ac6c0e3dae426eac7d3cdbf637MD54falseAnonymousREADTHUMBNAILRendonNestor_2023_FrechetDistanceUnsupervisedLearning.pdf.jpgRendonNestor_2023_FrechetDistanceUnsupervisedLearning.pdf.jpgGenerated Thumbnailimage/jpeg15122https://bibliotecadigital.udea.edu.co/bitstreams/260066d0-8708-4e35-8e42-ca846d477de7/downloaddd7c609e939547a289f8d40ab1205a11MD55falseAnonymousREAD10495/36087oai:bibliotecadigital.udea.edu.co:10495/360872025-03-26 22:15:29.686http://creativecommons.org/licenses/by-nc-nd/2.5/co/open.accesshttps://bibliotecadigital.udea.edu.coRepositorio Institucional de la Universidad de Antioquiaaplicacionbibliotecadigitalbiblioteca@udea.edu.coTk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=