Computer Architecture Simulation on GPU
ABSTRACT : Due to the power wall in computer architectures [33], processor manufacturers have adopted a strategy of increasing the number of cores on modern computers to enhance their performance and reduce power consumption [100][93][21]. As a result, research and innovation are now focused on inte...
- Autores:
-
Buitrago Paniagua, John Byron
- Tipo de recurso:
- Doctoral thesis
- Fecha de publicación:
- 2024
- Institución:
- Universidad de Antioquia
- Repositorio:
- Repositorio UdeA
- Idioma:
- eng
- OAI Identifier:
- oai:bibliotecadigital.udea.edu.co:10495/38543
- Acceso en línea:
- https://hdl.handle.net/10495/38543
- Palabra clave:
- Simulación por computadores
Computer simulation
Métodos de simulación
Simulation methods
Arquitectura de computadores
Computer architecture
- Rights
- embargoedAccess
- License
- https://creativecommons.org/licenses/by-nc-sa/4.0/
| id |
UDEA2_080ed90b8a592b57902861dd1439fffd |
|---|---|
| oai_identifier_str |
oai:bibliotecadigital.udea.edu.co:10495/38543 |
| network_acronym_str |
UDEA2 |
| network_name_str |
Repositorio UdeA |
| repository_id_str |
|
| dc.title.spa.fl_str_mv |
Computer Architecture Simulation on GPU |
| dc.title.translated.spa.fl_str_mv |
Simuación de Arquitectura de Computadores en GPUs |
| title |
Computer Architecture Simulation on GPU |
| spellingShingle |
Computer Architecture Simulation on GPU Simulación por computadores Computer simulation Métodos de simulación Simulation methods Arquitectura de computadores Computer architecture |
| title_short |
Computer Architecture Simulation on GPU |
| title_full |
Computer Architecture Simulation on GPU |
| title_fullStr |
Computer Architecture Simulation on GPU |
| title_full_unstemmed |
Computer Architecture Simulation on GPU |
| title_sort |
Computer Architecture Simulation on GPU |
| dc.creator.fl_str_mv |
Buitrago Paniagua, John Byron |
| dc.contributor.advisor.none.fl_str_mv |
Velasquez Ricardo, Rivera Fredy Velásquez Vélez, Ricardo Andrés |
| dc.contributor.author.none.fl_str_mv |
Buitrago Paniagua, John Byron |
| dc.contributor.researchgroup.spa.fl_str_mv |
Sistemas Embebidos e Inteligencia Computacional (SISTEMIC) |
| dc.subject.lemb.none.fl_str_mv |
Simulación por computadores Computer simulation Métodos de simulación Simulation methods Arquitectura de computadores Computer architecture |
| topic |
Simulación por computadores Computer simulation Métodos de simulación Simulation methods Arquitectura de computadores Computer architecture |
| description |
ABSTRACT : Due to the power wall in computer architectures [33], processor manufacturers have adopted a strategy of increasing the number of cores on modern computers to enhance their performance and reduce power consumption [100][93][21]. As a result, research and innovation are now focused on integrating more cores on a single chip and improving communication and memory systems, resulting in a complex design space for these architectures [93][110]. Due to processor complexity outpacing processor performance, computer architecture simulators tend to slow down over time [28][36]. This fact necessitates more advanced simulation strategies and tools. Computer architects have proposed various techniques to speed up their simulators by reducing their workload and separately exploiting their parallelism [89] [41] [104] [15] [54] [29] [55] [34]. This work aims to accelerate a detailed simulation of an uncore in a multicore architecture that supports multithreading workloads. The work proposes a methodology that integrates the core models technique [16] that abstracts the core behavior and reduces the time needed to simulate the cores. It also incorporates techniques that leverage the parallel capacity of GPU platforms to speed up the simulation. The methodology includes establishing a core model, a target uncore, their parallelization process, and their inter- action. Additionally, it involves simulator task distribution and optimizing GPU process runtime. The proposed methodology is evaluated through a case study, testing simulations of architectures from 1 to 128 cores to evaluate the speedup scaling and estimate the simulation’s accuracy. The reference model consists of a sequential CPU GEM5-based version. The simulation cycles and the number of misses for each cache in the architecture are compared for accuracy. The experimental results show that increasing the number of cores to be simulated in the architecture increases our GPU-based simulation’s speed compared to the CPU version. The parallel simulator shows an acceleration starting at 64 cores. In large part of the results, the accuracy for the simulation cycles has a relative error lower than 10%. The accuracy for the number of misses is a consistent metric through all cache levels. The relative error in most cases is lower than 5%. However, performance and accuracy metrics were dependent on workload type. Some simulations presented relative errors of around 40% for the simulation cycles, and some benchmarks did not reach any speedup at 128 cores. |
| publishDate |
2024 |
| dc.date.accessioned.none.fl_str_mv |
2024-03-11T19:32:41Z |
| dc.date.available.none.fl_str_mv |
2024-03-11T19:32:41Z |
| dc.date.issued.none.fl_str_mv |
2024 |
| dc.type.spa.fl_str_mv |
Tesis/Trabajo de grado - Monografía - Doctorado |
| dc.type.coar.spa.fl_str_mv |
http://purl.org/coar/resource_type/c_db06 |
| dc.type.redcol.spa.fl_str_mv |
https://purl.org/redcol/resource_type/TD |
| dc.type.coarversion.spa.fl_str_mv |
http://purl.org/coar/version/c_b1a7d7d4d402bcce |
| dc.type.driver.spa.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
| dc.type.version.spa.fl_str_mv |
info:eu-repo/semantics/draft |
| format |
http://purl.org/coar/resource_type/c_db06 |
| status_str |
draft |
| dc.identifier.uri.none.fl_str_mv |
https://hdl.handle.net/10495/38543 |
| url |
https://hdl.handle.net/10495/38543 |
| dc.language.iso.spa.fl_str_mv |
eng |
| language |
eng |
| dc.rights.uri.spa.fl_str_mv |
https://creativecommons.org/licenses/by-nc-sa/4.0/ |
| dc.rights.uri.*.fl_str_mv |
http://creativecommons.org/licenses/by/2.5/co/ |
| dc.rights.accessrights.spa.fl_str_mv |
info:eu-repo/semantics/embargoedAccess |
| dc.rights.coar.spa.fl_str_mv |
http://purl.org/coar/access_right/c_f1cf |
| rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/4.0/ http://creativecommons.org/licenses/by/2.5/co/ http://purl.org/coar/access_right/c_f1cf |
| eu_rights_str_mv |
embargoedAccess |
| dc.format.extent.spa.fl_str_mv |
124 páginas |
| dc.format.mimetype.spa.fl_str_mv |
application/pdf |
| dc.publisher.spa.fl_str_mv |
Universidad de Antioquia |
| dc.publisher.place.spa.fl_str_mv |
Medellín, Colombia |
| dc.publisher.faculty.spa.fl_str_mv |
Facultad de Ingeniería. Doctorado en Ingeniería Electrónica y de Computación |
| institution |
Universidad de Antioquia |
| bitstream.url.fl_str_mv |
https://bibliotecadigital.udea.edu.co/bitstreams/771fc3b5-31f6-4c88-a035-73484e02c33a/download https://bibliotecadigital.udea.edu.co/bitstreams/674ef0f2-20e7-41e0-8f86-21ee2f5ecf88/download https://bibliotecadigital.udea.edu.co/bitstreams/15568646-143a-4731-bcc1-9ff0de6959ab/download https://bibliotecadigital.udea.edu.co/bitstreams/46f5c82c-8782-4b5b-bfe1-1d5165d24329/download https://bibliotecadigital.udea.edu.co/bitstreams/59774f0e-8af0-45c9-ba28-29b2cffe2a84/download |
| bitstream.checksum.fl_str_mv |
1646d1f6b96dbbbc38035efc9239ac9c 8a4605be74aa9ea9d79846c1fba20a33 331ecb5fcb715540069034b9410b8317 e2716a0b3eab0217016a4a3274cb8863 c31e1f459db6e615db042b1f8cc23fd7 |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 |
| repository.name.fl_str_mv |
Repositorio Institucional de la Universidad de Antioquia |
| repository.mail.fl_str_mv |
aplicacionbibliotecadigitalbiblioteca@udea.edu.co |
| _version_ |
1851052354530443264 |
| spelling |
Velasquez Ricardo, Rivera FredyVelásquez Vélez, Ricardo AndrésBuitrago Paniagua, John ByronSistemas Embebidos e Inteligencia Computacional (SISTEMIC)2024-03-11T19:32:41Z2024-03-11T19:32:41Z2024https://hdl.handle.net/10495/38543ABSTRACT : Due to the power wall in computer architectures [33], processor manufacturers have adopted a strategy of increasing the number of cores on modern computers to enhance their performance and reduce power consumption [100][93][21]. As a result, research and innovation are now focused on integrating more cores on a single chip and improving communication and memory systems, resulting in a complex design space for these architectures [93][110]. Due to processor complexity outpacing processor performance, computer architecture simulators tend to slow down over time [28][36]. This fact necessitates more advanced simulation strategies and tools. Computer architects have proposed various techniques to speed up their simulators by reducing their workload and separately exploiting their parallelism [89] [41] [104] [15] [54] [29] [55] [34]. This work aims to accelerate a detailed simulation of an uncore in a multicore architecture that supports multithreading workloads. The work proposes a methodology that integrates the core models technique [16] that abstracts the core behavior and reduces the time needed to simulate the cores. It also incorporates techniques that leverage the parallel capacity of GPU platforms to speed up the simulation. The methodology includes establishing a core model, a target uncore, their parallelization process, and their inter- action. Additionally, it involves simulator task distribution and optimizing GPU process runtime. The proposed methodology is evaluated through a case study, testing simulations of architectures from 1 to 128 cores to evaluate the speedup scaling and estimate the simulation’s accuracy. The reference model consists of a sequential CPU GEM5-based version. The simulation cycles and the number of misses for each cache in the architecture are compared for accuracy. The experimental results show that increasing the number of cores to be simulated in the architecture increases our GPU-based simulation’s speed compared to the CPU version. The parallel simulator shows an acceleration starting at 64 cores. In large part of the results, the accuracy for the simulation cycles has a relative error lower than 10%. The accuracy for the number of misses is a consistent metric through all cache levels. The relative error in most cases is lower than 5%. However, performance and accuracy metrics were dependent on workload type. Some simulations presented relative errors of around 40% for the simulation cycles, and some benchmarks did not reach any speedup at 128 cores.COL0010717DoctoradoDoctor en Ingeniería Electrónica y de Computación124 páginasapplication/pdfengUniversidad de AntioquiaMedellín, ColombiaFacultad de Ingeniería. Doctorado en Ingeniería Electrónica y de Computaciónhttps://creativecommons.org/licenses/by-nc-sa/4.0/http://creativecommons.org/licenses/by/2.5/co/info:eu-repo/semantics/embargoedAccesshttp://purl.org/coar/access_right/c_f1cfComputer Architecture Simulation on GPUSimuación de Arquitectura de Computadores en GPUsTesis/Trabajo de grado - Monografía - Doctoradohttp://purl.org/coar/resource_type/c_db06https://purl.org/redcol/resource_type/TDhttp://purl.org/coar/version/c_b1a7d7d4d402bcceinfo:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/draftSimulación por computadoresComputer simulationMétodos de simulaciónSimulation methodsArquitectura de computadoresComputer architecturePublicationCC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8927https://bibliotecadigital.udea.edu.co/bitstreams/771fc3b5-31f6-4c88-a035-73484e02c33a/download1646d1f6b96dbbbc38035efc9239ac9cMD53falseAnonymousREADLICENSElicense.txtlicense.txttext/plain; charset=utf-81748https://bibliotecadigital.udea.edu.co/bitstreams/674ef0f2-20e7-41e0-8f86-21ee2f5ecf88/download8a4605be74aa9ea9d79846c1fba20a33MD54falseAnonymousREADORIGINALBuitragoJohn_2024_ComputerArchitectureSimulation.pdfBuitragoJohn_2024_ComputerArchitectureSimulation.pdfTesis doctoralapplication/pdf1959291https://bibliotecadigital.udea.edu.co/bitstreams/15568646-143a-4731-bcc1-9ff0de6959ab/download331ecb5fcb715540069034b9410b8317MD55trueAnonymousREAD2025-02-22TEXTBuitragoJohn_2024_ComputerArchitectureSimulation.pdf.txtBuitragoJohn_2024_ComputerArchitectureSimulation.pdf.txtExtracted texttext/plain100315https://bibliotecadigital.udea.edu.co/bitstreams/46f5c82c-8782-4b5b-bfe1-1d5165d24329/downloade2716a0b3eab0217016a4a3274cb8863MD56falseAnonymousREAD2025-02-22THUMBNAILBuitragoJohn_2024_ComputerArchitectureSimulation.pdf.jpgBuitragoJohn_2024_ComputerArchitectureSimulation.pdf.jpgGenerated Thumbnailimage/jpeg6459https://bibliotecadigital.udea.edu.co/bitstreams/59774f0e-8af0-45c9-ba28-29b2cffe2a84/downloadc31e1f459db6e615db042b1f8cc23fd7MD57falseAnonymousREAD2025-02-2210495/38543oai:bibliotecadigital.udea.edu.co:10495/385432025-03-26 21:01:44.547https://creativecommons.org/licenses/by-nc-sa/4.0/open.accesshttps://bibliotecadigital.udea.edu.coRepositorio Institucional de la Universidad de Antioquiaaplicacionbibliotecadigitalbiblioteca@udea.edu.coTk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo= |
