Flujo De Trabajo Para El Proceso De Etl Del Portal De Datos Abiertos Bogotá Para Generar Un Formato Más Legible Y Limpio A Través De La Herramienta Databricks

In this project, the transformation and cleaning of the dataset “Confirmed COVID19 Cases in Bogotá D.C.” from the Bogotá Open Data source are addressed. These processes are guided by the application of capability areas established by DAMADMBOK. The process focuses on applying best practices to impro...

Full description

Autores:
Moreno Zuluaga, Jhon Alexander
Tipo de recurso:
Tesis
Fecha de publicación:
2024
Institución:
Universidad Antonio Nariño
Repositorio:
Repositorio UAN
Idioma:
spa
OAI Identifier:
oai:repositorio.uan.edu.co:123456789/12139
Acceso en línea:
https://repositorio.uan.edu.co/handle/123456789/12139
Palabra clave:
Gobierno de datos
Análisis de datos
Databricks
ETL
Integración
Interoperabilidad
Data Governance
Data Analysis
ETL
Integration
Interoperability
Databrick
Rights
openAccess
License
Attribution-NonCommercial-NoDerivs 2.5 Colombia
Description
Summary:In this project, the transformation and cleaning of the dataset “Confirmed COVID19 Cases in Bogotá D.C.” from the Bogotá Open Data source are addressed. These processes are guided by the application of capability areas established by DAMADMBOK. The process focuses on applying best practices to improve quality and organization, facilitating analysis and ensuring access to information through the Databricks tool, where the development of a workflow will guide the applicability in the project with ETL processes and the implementation of best practices in data governance. The applicability of this project also extends to the use of the Databricks tool, demonstrating its capacity and scalability in data integration and interoperability processes, as well as in data analysis.