Exploratory data analysis in the context of data mining and resampling.

Today there are quite a few widespread misconceptions of exploratory data analysis (EDA). One of these misperceptions is that EDA is said to be opposed to statistical modeling. Actually, the essence of EDA is not about putting aside all modeling and preconceptions; rather, researchers are urged not...

Full description

Autores:
Ho Yu, Chong
Tipo de recurso:
Article of journal
Fecha de publicación:
2010
Institución:
Universidad de San Buenaventura
Repositorio:
Repositorio USB
Idioma:
eng
OAI Identifier:
oai:bibliotecadigital.usb.edu.co:10819/25698
Acceso en línea:
https://hdl.handle.net/10819/25698
https://doi.org/10.21500/20112084.819
Palabra clave:
exploratory data analysis
data mining
resampling
cross-validation
data visualization
clustering
classification trees
neural networks
Rights
openAccess
License
International Journal of Psychological Research - 2010
id SANBUENAV2_238c28030e644fcfbc806ea9fc39c194
oai_identifier_str oai:bibliotecadigital.usb.edu.co:10819/25698
network_acronym_str SANBUENAV2
network_name_str Repositorio USB
repository_id_str
spelling Ho Yu, Chong2010-06-30T00:00:00Z2025-07-31T16:11:18Z2010-06-30T00:00:00Z2025-07-31T16:11:18Z2010-06-30Today there are quite a few widespread misconceptions of exploratory data analysis (EDA). One of these misperceptions is that EDA is said to be opposed to statistical modeling. Actually, the essence of EDA is not about putting aside all modeling and preconceptions; rather, researchers are urged not to start the analysis with a strong preconception only, and thus modeling is still legitimate in EDA. In addition, the nature of EDA has been changing due to the emergence of new methods and convergence between EDA and other methodologies, such as data mining and resampling. Therefore, conventional conceptual frameworks of EDA might no longer be capable of coping with this trend. In this article, EDA is introduced in the context of data mining and resampling with an emphasis on three goals: cluster detection, variable selection, and pattern recognition. TwoStep clustering, classification trees, and neural networks, which are powerful techniques to accomplish the preceding goals, respectively, are illustrated with concrete examples.application/pdf10.21500/20112084.8192011-79222011-2084https://hdl.handle.net/10819/25698https://doi.org/10.21500/20112084.819engUniversidad San Buenaventura - USB (Colombia)https://revistas.usb.edu.co/index.php/IJPR/article/download/819/595Núm. 1 , Año 2010 : Special Issue of Statistics in Psychology22193International Journal of Psychological ResearchAltman, D. G., & Royston, P. (2000).What do we mean by validating a prognostic model? Statistics in Medicine, 19, 453-473. Baker, B. D., & Richards, C. E. (1999). A comparison of conventional linear regression methods and neural networks for forecasting educational spending. Economics of Education Review, 18, 405-415. Behrens, J. T. & Yu, C. H. (2003). Exploratory data analysis. In J. A. Schinka & W. F. Velicer, (Eds.), Handbook of psychology Volume 2: Research methods in Psychology (pp. 33-64). New Jersey: John Wiley & Sons, Inc. Behrens, J. T. (1997). Principles and procedures of exploratory data analysis. Psychological Methods, 2, 131-160. Berk, R. A. (2008). Statistical learning from a regression perspective. New York: Springer. Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. Monterey, CA: Wadsworth International Group. Carpio, K.J.E. & Hermosilla, A.Y. (2002), On multicollinearity and artificial neural networks, Complexity International, 10, Retrieved October 8, 2009, from http://www.complexity.org.au/ci/vol10/hermos01/.International Journal of Psychological Research - 2010info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2https://creativecommons.org/licenses/by-nc-sa/4.0/https://revistas.usb.edu.co/index.php/IJPR/article/view/819exploratory data analysisdata miningresamplingcross-validationdata visualizationclusteringclassification treesneural networksExploratory data analysis in the context of data mining and resampling.Exploratory data analysis in the context of data mining and resampling.Artículo de revistahttp://purl.org/coar/resource_type/c_6501http://purl.org/coar/resource_type/c_2df8fbb1http://purl.org/coar/version/c_970fb48d4fbd8a85Textinfo:eu-repo/semantics/articleJournal articleinfo:eu-repo/semantics/publishedVersionPublicationOREORE.xmltext/xml2502https://bibliotecadigital.usb.edu.co/bitstreams/ea243411-1a1e-47e3-9c76-f4a033fd4e58/download54633b1fe430f2951da3ab8ed44bfeb9MD5110819/25698oai:bibliotecadigital.usb.edu.co:10819/256982025-07-31 11:11:18.309https://creativecommons.org/licenses/by-nc-sa/4.0/https://bibliotecadigital.usb.edu.coRepositorio Institucional Universidad de San Buenaventura Colombiabdigital@metabiblioteca.com
dc.title.spa.fl_str_mv Exploratory data analysis in the context of data mining and resampling.
dc.title.translated.spa.fl_str_mv Exploratory data analysis in the context of data mining and resampling.
title Exploratory data analysis in the context of data mining and resampling.
spellingShingle Exploratory data analysis in the context of data mining and resampling.
exploratory data analysis
data mining
resampling
cross-validation
data visualization
clustering
classification trees
neural networks
title_short Exploratory data analysis in the context of data mining and resampling.
title_full Exploratory data analysis in the context of data mining and resampling.
title_fullStr Exploratory data analysis in the context of data mining and resampling.
title_full_unstemmed Exploratory data analysis in the context of data mining and resampling.
title_sort Exploratory data analysis in the context of data mining and resampling.
dc.creator.fl_str_mv Ho Yu, Chong
dc.contributor.author.eng.fl_str_mv Ho Yu, Chong
dc.subject.eng.fl_str_mv exploratory data analysis
data mining
resampling
cross-validation
data visualization
clustering
classification trees
neural networks
topic exploratory data analysis
data mining
resampling
cross-validation
data visualization
clustering
classification trees
neural networks
description Today there are quite a few widespread misconceptions of exploratory data analysis (EDA). One of these misperceptions is that EDA is said to be opposed to statistical modeling. Actually, the essence of EDA is not about putting aside all modeling and preconceptions; rather, researchers are urged not to start the analysis with a strong preconception only, and thus modeling is still legitimate in EDA. In addition, the nature of EDA has been changing due to the emergence of new methods and convergence between EDA and other methodologies, such as data mining and resampling. Therefore, conventional conceptual frameworks of EDA might no longer be capable of coping with this trend. In this article, EDA is introduced in the context of data mining and resampling with an emphasis on three goals: cluster detection, variable selection, and pattern recognition. TwoStep clustering, classification trees, and neural networks, which are powerful techniques to accomplish the preceding goals, respectively, are illustrated with concrete examples.
publishDate 2010
dc.date.accessioned.none.fl_str_mv 2010-06-30T00:00:00Z
2025-07-31T16:11:18Z
dc.date.available.none.fl_str_mv 2010-06-30T00:00:00Z
2025-07-31T16:11:18Z
dc.date.issued.none.fl_str_mv 2010-06-30
dc.type.spa.fl_str_mv Artículo de revista
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.coar.eng.fl_str_mv http://purl.org/coar/resource_type/c_6501
dc.type.coarversion.eng.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.content.eng.fl_str_mv Text
dc.type.driver.eng.fl_str_mv info:eu-repo/semantics/article
dc.type.local.eng.fl_str_mv Journal article
dc.type.version.eng.fl_str_mv info:eu-repo/semantics/publishedVersion
format http://purl.org/coar/resource_type/c_6501
status_str publishedVersion
dc.identifier.doi.none.fl_str_mv 10.21500/20112084.819
dc.identifier.eissn.none.fl_str_mv 2011-7922
dc.identifier.issn.none.fl_str_mv 2011-2084
dc.identifier.uri.none.fl_str_mv https://hdl.handle.net/10819/25698
dc.identifier.url.none.fl_str_mv https://doi.org/10.21500/20112084.819
identifier_str_mv 10.21500/20112084.819
2011-7922
2011-2084
url https://hdl.handle.net/10819/25698
https://doi.org/10.21500/20112084.819
dc.language.iso.eng.fl_str_mv eng
language eng
dc.relation.bitstream.none.fl_str_mv https://revistas.usb.edu.co/index.php/IJPR/article/download/819/595
dc.relation.citationedition.eng.fl_str_mv Núm. 1 , Año 2010 : Special Issue of Statistics in Psychology
dc.relation.citationendpage.none.fl_str_mv 22
dc.relation.citationissue.eng.fl_str_mv 1
dc.relation.citationstartpage.none.fl_str_mv 9
dc.relation.citationvolume.eng.fl_str_mv 3
dc.relation.ispartofjournal.eng.fl_str_mv International Journal of Psychological Research
dc.relation.references.eng.fl_str_mv Altman, D. G., & Royston, P. (2000).What do we mean by validating a prognostic model? Statistics in Medicine, 19, 453-473. Baker, B. D., & Richards, C. E. (1999). A comparison of conventional linear regression methods and neural networks for forecasting educational spending. Economics of Education Review, 18, 405-415. Behrens, J. T. & Yu, C. H. (2003). Exploratory data analysis. In J. A. Schinka & W. F. Velicer, (Eds.), Handbook of psychology Volume 2: Research methods in Psychology (pp. 33-64). New Jersey: John Wiley & Sons, Inc. Behrens, J. T. (1997). Principles and procedures of exploratory data analysis. Psychological Methods, 2, 131-160. Berk, R. A. (2008). Statistical learning from a regression perspective. New York: Springer. Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. Monterey, CA: Wadsworth International Group. Carpio, K.J.E. & Hermosilla, A.Y. (2002), On multicollinearity and artificial neural networks, Complexity International, 10, Retrieved October 8, 2009, from http://www.complexity.org.au/ci/vol10/hermos01/.
dc.rights.eng.fl_str_mv International Journal of Psychological Research - 2010
dc.rights.accessrights.eng.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.coar.eng.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.rights.uri.eng.fl_str_mv https://creativecommons.org/licenses/by-nc-sa/4.0/
rights_invalid_str_mv International Journal of Psychological Research - 2010
http://purl.org/coar/access_right/c_abf2
https://creativecommons.org/licenses/by-nc-sa/4.0/
eu_rights_str_mv openAccess
dc.format.mimetype.eng.fl_str_mv application/pdf
dc.publisher.eng.fl_str_mv Universidad San Buenaventura - USB (Colombia)
dc.source.eng.fl_str_mv https://revistas.usb.edu.co/index.php/IJPR/article/view/819
institution Universidad de San Buenaventura
bitstream.url.fl_str_mv https://bibliotecadigital.usb.edu.co/bitstreams/ea243411-1a1e-47e3-9c76-f4a033fd4e58/download
bitstream.checksum.fl_str_mv 54633b1fe430f2951da3ab8ed44bfeb9
bitstream.checksumAlgorithm.fl_str_mv MD5
repository.name.fl_str_mv Repositorio Institucional Universidad de San Buenaventura Colombia
repository.mail.fl_str_mv bdigital@metabiblioteca.com
_version_ 1851053605582274560