Exploratory data analysis in the context of data mining and resampling.
Today there are quite a few widespread misconceptions of exploratory data analysis (EDA). One of these misperceptions is that EDA is said to be opposed to statistical modeling. Actually, the essence of EDA is not about putting aside all modeling and preconceptions; rather, researchers are urged not...
- Autores:
-
Ho Yu, Chong
- Tipo de recurso:
- Article of journal
- Fecha de publicación:
- 2010
- Institución:
- Universidad de San Buenaventura
- Repositorio:
- Repositorio USB
- Idioma:
- eng
- OAI Identifier:
- oai:bibliotecadigital.usb.edu.co:10819/25698
- Acceso en línea:
- https://hdl.handle.net/10819/25698
https://doi.org/10.21500/20112084.819
- Palabra clave:
- exploratory data analysis
data mining
resampling
cross-validation
data visualization
clustering
classification trees
neural networks
- Rights
- openAccess
- License
- International Journal of Psychological Research - 2010
| id |
SANBUENAV2_238c28030e644fcfbc806ea9fc39c194 |
|---|---|
| oai_identifier_str |
oai:bibliotecadigital.usb.edu.co:10819/25698 |
| network_acronym_str |
SANBUENAV2 |
| network_name_str |
Repositorio USB |
| repository_id_str |
|
| spelling |
Ho Yu, Chong2010-06-30T00:00:00Z2025-07-31T16:11:18Z2010-06-30T00:00:00Z2025-07-31T16:11:18Z2010-06-30Today there are quite a few widespread misconceptions of exploratory data analysis (EDA). One of these misperceptions is that EDA is said to be opposed to statistical modeling. Actually, the essence of EDA is not about putting aside all modeling and preconceptions; rather, researchers are urged not to start the analysis with a strong preconception only, and thus modeling is still legitimate in EDA. In addition, the nature of EDA has been changing due to the emergence of new methods and convergence between EDA and other methodologies, such as data mining and resampling. Therefore, conventional conceptual frameworks of EDA might no longer be capable of coping with this trend. In this article, EDA is introduced in the context of data mining and resampling with an emphasis on three goals: cluster detection, variable selection, and pattern recognition. TwoStep clustering, classification trees, and neural networks, which are powerful techniques to accomplish the preceding goals, respectively, are illustrated with concrete examples.application/pdf10.21500/20112084.8192011-79222011-2084https://hdl.handle.net/10819/25698https://doi.org/10.21500/20112084.819engUniversidad San Buenaventura - USB (Colombia)https://revistas.usb.edu.co/index.php/IJPR/article/download/819/595Núm. 1 , Año 2010 : Special Issue of Statistics in Psychology22193International Journal of Psychological ResearchAltman, D. G., & Royston, P. (2000).What do we mean by validating a prognostic model? Statistics in Medicine, 19, 453-473. Baker, B. D., & Richards, C. E. (1999). A comparison of conventional linear regression methods and neural networks for forecasting educational spending. Economics of Education Review, 18, 405-415. Behrens, J. T. & Yu, C. H. (2003). Exploratory data analysis. In J. A. Schinka & W. F. Velicer, (Eds.), Handbook of psychology Volume 2: Research methods in Psychology (pp. 33-64). New Jersey: John Wiley & Sons, Inc. Behrens, J. T. (1997). Principles and procedures of exploratory data analysis. Psychological Methods, 2, 131-160. Berk, R. A. (2008). Statistical learning from a regression perspective. New York: Springer. Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. Monterey, CA: Wadsworth International Group. Carpio, K.J.E. & Hermosilla, A.Y. (2002), On multicollinearity and artificial neural networks, Complexity International, 10, Retrieved October 8, 2009, from http://www.complexity.org.au/ci/vol10/hermos01/.International Journal of Psychological Research - 2010info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2https://creativecommons.org/licenses/by-nc-sa/4.0/https://revistas.usb.edu.co/index.php/IJPR/article/view/819exploratory data analysisdata miningresamplingcross-validationdata visualizationclusteringclassification treesneural networksExploratory data analysis in the context of data mining and resampling.Exploratory data analysis in the context of data mining and resampling.Artículo de revistahttp://purl.org/coar/resource_type/c_6501http://purl.org/coar/resource_type/c_2df8fbb1http://purl.org/coar/version/c_970fb48d4fbd8a85Textinfo:eu-repo/semantics/articleJournal articleinfo:eu-repo/semantics/publishedVersionPublicationOREORE.xmltext/xml2502https://bibliotecadigital.usb.edu.co/bitstreams/ea243411-1a1e-47e3-9c76-f4a033fd4e58/download54633b1fe430f2951da3ab8ed44bfeb9MD5110819/25698oai:bibliotecadigital.usb.edu.co:10819/256982025-07-31 11:11:18.309https://creativecommons.org/licenses/by-nc-sa/4.0/https://bibliotecadigital.usb.edu.coRepositorio Institucional Universidad de San Buenaventura Colombiabdigital@metabiblioteca.com |
| dc.title.spa.fl_str_mv |
Exploratory data analysis in the context of data mining and resampling. |
| dc.title.translated.spa.fl_str_mv |
Exploratory data analysis in the context of data mining and resampling. |
| title |
Exploratory data analysis in the context of data mining and resampling. |
| spellingShingle |
Exploratory data analysis in the context of data mining and resampling. exploratory data analysis data mining resampling cross-validation data visualization clustering classification trees neural networks |
| title_short |
Exploratory data analysis in the context of data mining and resampling. |
| title_full |
Exploratory data analysis in the context of data mining and resampling. |
| title_fullStr |
Exploratory data analysis in the context of data mining and resampling. |
| title_full_unstemmed |
Exploratory data analysis in the context of data mining and resampling. |
| title_sort |
Exploratory data analysis in the context of data mining and resampling. |
| dc.creator.fl_str_mv |
Ho Yu, Chong |
| dc.contributor.author.eng.fl_str_mv |
Ho Yu, Chong |
| dc.subject.eng.fl_str_mv |
exploratory data analysis data mining resampling cross-validation data visualization clustering classification trees neural networks |
| topic |
exploratory data analysis data mining resampling cross-validation data visualization clustering classification trees neural networks |
| description |
Today there are quite a few widespread misconceptions of exploratory data analysis (EDA). One of these misperceptions is that EDA is said to be opposed to statistical modeling. Actually, the essence of EDA is not about putting aside all modeling and preconceptions; rather, researchers are urged not to start the analysis with a strong preconception only, and thus modeling is still legitimate in EDA. In addition, the nature of EDA has been changing due to the emergence of new methods and convergence between EDA and other methodologies, such as data mining and resampling. Therefore, conventional conceptual frameworks of EDA might no longer be capable of coping with this trend. In this article, EDA is introduced in the context of data mining and resampling with an emphasis on three goals: cluster detection, variable selection, and pattern recognition. TwoStep clustering, classification trees, and neural networks, which are powerful techniques to accomplish the preceding goals, respectively, are illustrated with concrete examples. |
| publishDate |
2010 |
| dc.date.accessioned.none.fl_str_mv |
2010-06-30T00:00:00Z 2025-07-31T16:11:18Z |
| dc.date.available.none.fl_str_mv |
2010-06-30T00:00:00Z 2025-07-31T16:11:18Z |
| dc.date.issued.none.fl_str_mv |
2010-06-30 |
| dc.type.spa.fl_str_mv |
Artículo de revista |
| dc.type.coar.fl_str_mv |
http://purl.org/coar/resource_type/c_2df8fbb1 |
| dc.type.coar.eng.fl_str_mv |
http://purl.org/coar/resource_type/c_6501 |
| dc.type.coarversion.eng.fl_str_mv |
http://purl.org/coar/version/c_970fb48d4fbd8a85 |
| dc.type.content.eng.fl_str_mv |
Text |
| dc.type.driver.eng.fl_str_mv |
info:eu-repo/semantics/article |
| dc.type.local.eng.fl_str_mv |
Journal article |
| dc.type.version.eng.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| format |
http://purl.org/coar/resource_type/c_6501 |
| status_str |
publishedVersion |
| dc.identifier.doi.none.fl_str_mv |
10.21500/20112084.819 |
| dc.identifier.eissn.none.fl_str_mv |
2011-7922 |
| dc.identifier.issn.none.fl_str_mv |
2011-2084 |
| dc.identifier.uri.none.fl_str_mv |
https://hdl.handle.net/10819/25698 |
| dc.identifier.url.none.fl_str_mv |
https://doi.org/10.21500/20112084.819 |
| identifier_str_mv |
10.21500/20112084.819 2011-7922 2011-2084 |
| url |
https://hdl.handle.net/10819/25698 https://doi.org/10.21500/20112084.819 |
| dc.language.iso.eng.fl_str_mv |
eng |
| language |
eng |
| dc.relation.bitstream.none.fl_str_mv |
https://revistas.usb.edu.co/index.php/IJPR/article/download/819/595 |
| dc.relation.citationedition.eng.fl_str_mv |
Núm. 1 , Año 2010 : Special Issue of Statistics in Psychology |
| dc.relation.citationendpage.none.fl_str_mv |
22 |
| dc.relation.citationissue.eng.fl_str_mv |
1 |
| dc.relation.citationstartpage.none.fl_str_mv |
9 |
| dc.relation.citationvolume.eng.fl_str_mv |
3 |
| dc.relation.ispartofjournal.eng.fl_str_mv |
International Journal of Psychological Research |
| dc.relation.references.eng.fl_str_mv |
Altman, D. G., & Royston, P. (2000).What do we mean by validating a prognostic model? Statistics in Medicine, 19, 453-473. Baker, B. D., & Richards, C. E. (1999). A comparison of conventional linear regression methods and neural networks for forecasting educational spending. Economics of Education Review, 18, 405-415. Behrens, J. T. & Yu, C. H. (2003). Exploratory data analysis. In J. A. Schinka & W. F. Velicer, (Eds.), Handbook of psychology Volume 2: Research methods in Psychology (pp. 33-64). New Jersey: John Wiley & Sons, Inc. Behrens, J. T. (1997). Principles and procedures of exploratory data analysis. Psychological Methods, 2, 131-160. Berk, R. A. (2008). Statistical learning from a regression perspective. New York: Springer. Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. Monterey, CA: Wadsworth International Group. Carpio, K.J.E. & Hermosilla, A.Y. (2002), On multicollinearity and artificial neural networks, Complexity International, 10, Retrieved October 8, 2009, from http://www.complexity.org.au/ci/vol10/hermos01/. |
| dc.rights.eng.fl_str_mv |
International Journal of Psychological Research - 2010 |
| dc.rights.accessrights.eng.fl_str_mv |
info:eu-repo/semantics/openAccess |
| dc.rights.coar.eng.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
| dc.rights.uri.eng.fl_str_mv |
https://creativecommons.org/licenses/by-nc-sa/4.0/ |
| rights_invalid_str_mv |
International Journal of Psychological Research - 2010 http://purl.org/coar/access_right/c_abf2 https://creativecommons.org/licenses/by-nc-sa/4.0/ |
| eu_rights_str_mv |
openAccess |
| dc.format.mimetype.eng.fl_str_mv |
application/pdf |
| dc.publisher.eng.fl_str_mv |
Universidad San Buenaventura - USB (Colombia) |
| dc.source.eng.fl_str_mv |
https://revistas.usb.edu.co/index.php/IJPR/article/view/819 |
| institution |
Universidad de San Buenaventura |
| bitstream.url.fl_str_mv |
https://bibliotecadigital.usb.edu.co/bitstreams/ea243411-1a1e-47e3-9c76-f4a033fd4e58/download |
| bitstream.checksum.fl_str_mv |
54633b1fe430f2951da3ab8ed44bfeb9 |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 |
| repository.name.fl_str_mv |
Repositorio Institucional Universidad de San Buenaventura Colombia |
| repository.mail.fl_str_mv |
bdigital@metabiblioteca.com |
| _version_ |
1851053605582274560 |
