Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware
Aunque los sistemas heterogéneos basados en aceleradores hardware son un tópico de tendencia en la comunidad HPC, explorar las ventajas y desventajas de los basados en hardware reconfigurable en librerías de álgebra lineal para sistemas de alto rendimiento, no ha sido estudiado en profundidad. Por e...
- Autores:
-
Morales Peña, Alejandro
Meneses, Esteban
- Tipo de recurso:
- Article of investigation
- Fecha de publicación:
- 2024
- Institución:
- Universidad Autónoma de Bucaramanga - UNAB
- Repositorio:
- Repositorio UNAB
- Idioma:
- spa
- OAI Identifier:
- oai:repository.unab.edu.co:20.500.12749/28300
- Palabra clave:
- Computación de Alto Rendimiento
Ginkgo
FPGAs
SpMV
HPC
Ginkgo
FPGAs
SpMV
- Rights
- License
- http://purl.org/coar/access_right/c_abf2
id |
UNAB2_b3f1ff13f97d11f009ac8c77344393fd |
---|---|
oai_identifier_str |
oai:repository.unab.edu.co:20.500.12749/28300 |
network_acronym_str |
UNAB2 |
network_name_str |
Repositorio UNAB |
repository_id_str |
|
dc.title.spa.fl_str_mv |
Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware |
dc.title.translated.eng.fl_str_mv |
Extending Ginkgo to Manage Reconfigurable Hardware-Based Kernels |
title |
Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware |
spellingShingle |
Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware Computación de Alto Rendimiento Ginkgo FPGAs SpMV HPC Ginkgo FPGAs SpMV |
title_short |
Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware |
title_full |
Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware |
title_fullStr |
Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware |
title_full_unstemmed |
Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware |
title_sort |
Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware |
dc.creator.fl_str_mv |
Morales Peña, Alejandro Meneses, Esteban |
dc.contributor.author.none.fl_str_mv |
Morales Peña, Alejandro Meneses, Esteban |
dc.contributor.orcid.spa.fl_str_mv |
Morales Peña, Alejandro [0009-0004-9508-4266] Meneses, Esteban [0000-0002-4307-6000] |
dc.subject.spa.fl_str_mv |
Computación de Alto Rendimiento Ginkgo FPGAs SpMV |
topic |
Computación de Alto Rendimiento Ginkgo FPGAs SpMV HPC Ginkgo FPGAs SpMV |
dc.subject.keywords.eng.fl_str_mv |
HPC Ginkgo FPGAs SpMV |
description |
Aunque los sistemas heterogéneos basados en aceleradores hardware son un tópico de tendencia en la comunidad HPC, explorar las ventajas y desventajas de los basados en hardware reconfigurable en librerías de álgebra lineal para sistemas de alto rendimiento, no ha sido estudiado en profundidad. Por ello, en esta investigación, nuestro objetivo es aprovechar la capacidad de reconfiguración, adaptabilidad y reducción del consumo de energía de las FPGAs para generar kernels basados en FPGAs en Ginkgo, una librería especializada de álgebra lineal de alto rendimiento para sistemas multinúcleo. Generamos 3 kernels basados en FPGA para los formatos CSR, SELLP y SELL SpMV, y obtuvimos aumentos de velocidad de al menos 10 veces respecto a los kernels basados en CPU. Además, demostramos mediante un estudio de caracterización del rendimiento que las FPGA superan a los procesadores de propósito general en términos de tiempo de cálculo. |
publishDate |
2024 |
dc.date.issued.none.fl_str_mv |
2024-06-18 |
dc.date.accessioned.none.fl_str_mv |
2025-02-14T13:52:06Z |
dc.date.available.none.fl_str_mv |
2025-02-14T13:52:06Z |
dc.type.coarversion.fl_str_mv |
http://purl.org/coar/version/c_970fb48d4fbd8a85 |
dc.type.driver.none.fl_str_mv |
info:eu-repo/semantics/article |
dc.type.local.spa.fl_str_mv |
Artículo |
dc.type.coar.none.fl_str_mv |
http://purl.org/coar/resource_type/c_2df8fbb1 |
dc.type.redcol.none.fl_str_mv |
http://purl.org/redcol/resource_type/ART |
format |
http://purl.org/coar/resource_type/c_2df8fbb1 |
dc.identifier.issn.spa.fl_str_mv |
1657-2831 2539-2115 |
dc.identifier.uri.none.fl_str_mv |
http://hdl.handle.net/20.500.12749/28300 |
dc.identifier.instname.spa.fl_str_mv |
instname:Universidad Autónoma de Bucaramanga UNAB |
dc.identifier.repourl.spa.fl_str_mv |
repourl:https://repository.unab.edu.co |
dc.identifier.doi.none.fl_str_mv |
https://doi.org/10.29375/25392115.5276 |
identifier_str_mv |
1657-2831 2539-2115 instname:Universidad Autónoma de Bucaramanga UNAB repourl:https://repository.unab.edu.co |
url |
http://hdl.handle.net/20.500.12749/28300 https://doi.org/10.29375/25392115.5276 |
dc.language.iso.spa.fl_str_mv |
spa |
language |
spa |
dc.relation.spa.fl_str_mv |
https://revistas.unab.edu.co/index.php/rcc/article/view/5276/4086 |
dc.relation.uri.spa.fl_str_mv |
https://revistas.unab.edu.co/index.php/rcc/issue/view/303 |
dc.relation.references.none.fl_str_mv |
AMD. (2022a, August 4). Heterogeneous Accelerated Compute Cluster (HACC) Program. (Advanced Micro Devices, Inc) Retrieved 2023, from AMD Website: https://www.amd.com/en/corporate/university-program/aup-hacc.html AMD. (2022b, October 7). XRT Native APIs. (Advanced Micro Devices, Inc) Retrieved 2023, from https://xilinx.github.io/XRT/master/html/xrt_native_apis.html AMD. (2023). ROCm Software 5.3.0: HIP Documentation. (Advanced Micro Devices, Inc) Retrieved 2024, from AMD website: https://rocm.docs.amd.com/projects/HIP/en/docs-5.3.0/index.html AMD. (2024, May 15). AMD. (Advanced Micro Devices, Inc) Retrieved 2024, from AMD Website: https://www.amd.com/en.html Anderson, E., Bai, Z., Bischof, C., Blackford, L. S., Demmel, J., Dongarra, J., . . . Sorensen, D. (1999). LAPACK Users' Guide (Third ed.). Philadelphia, USA: SIAM. doi:10.1137/1.9780898719604 Anzt, H., Cojean, T., Chen, Y.-C., Flegar, G., Göbel, F., Grützmacher, T., . . . Tsai, Y.-H. (2020). Ginkgo: A high performance numerical linear algebra library. Journal of Open Source Software, 5(52), 1-6, 2260. doi:10.21105/joss.02260 Anzt, H., Cojean, T., Flegar, G., Göbel, F., Grützmacher, T., Nayak, P., . . . Quintana-Ortí, E. S. (2022, March). Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing. (Z. Bai, & W. Bangerth, Eds.) ACM Transactions on Mathematical Software (TOMS), 48(1), 1-33, Article No. 2. doi:10.1145/3480935 Anzt, H., Tomov, S., & Dongarra, J. (2014, April). Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-formats on NVIDIA GPUs. Technical Report UT-EECS-14-727, University of Tennessee. Retrieved from https://icl.utk.edu/files/publications/2014/icl-utk-772- 2014.pdf Bosch, J., Tan, X., Filgueras, A., Vidal, M., Mateu, M., Jiménez-González, D., . . . Labarta, J. (2018). Application Acceleration on FPGAs with OmpSs@FPGA. In 2018 International Conference on Field-Programmable Technology (FPT), Naha, Japan, 10-14 Dec. (pp. 70-77). IEEE. doi:10.1109/FPT.2018.00021 BSC. (2016). Linear Algebra and Math Libraries. (Barcelona Supercomputing Center) Retrieved 2023, from BSC website: https://www.bsc.es/research-development/research-areas/programmingmodels/ linear-algebra-and-math-libraries Cppreference. (2024, October 4). RAII. Retrieved from Cppreference website: https://en.cppreference.com/w/cpp/language/raii Davis, T. A., & Hu, Y. (2011, November). The university of Florida sparse matrix collection. ACM Transactions on Mathematical Software, 38(1), 1-25, Article 1. doi:10.1145/2049662.2049663 De Matteis, T., de Fine Licht, J., & Hoefler, T. (2020). fBLAS: streaming linear algebra on FPGA. SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 9 - 19 (pp. 1-13, Article 59). Atlanta, Georgia, USA: IEEE. doi:10.5555/3433701.3433779 Dongarra, J. J., & Walker, D. W. (1995). Software Libraries for Linear Algebra Computations on High Performance Computers. SIAM Review, 37(2), 151-180. doi:10.1137/1037042 Dongarra, J., & Blackford, L. S. (1996). ScaLAPACK tutorial. In J. Waśniewski, J. Dongarra, K. Madsen, & D. Olesen, Applied Parallel Computing Industrial Computation and Optimization. Third International Workshop, PARA 1996, Lyngby, Denmark, August 18-21. Lecture Notes in Computer Science (Vol. 1184, pp. 204–215). Berlin, Heidelberg, Germany: Springer. doi:10.1007/3-540-62095-8_22 ETH Zürich. (2024). ETH Zürich. Retrieved from ETH website: https://ethz.ch/de.html Fang, J., Mulder, Y. B., Hidders, J., Lee, J., & Hofstee, H. P. (2020, January). In-memory database acceleration on FPGAs: a survey. The VLDB Journal, 29(1), 33–59. doi:10.1007/s00778-019- 00581-w Gao, Y., & Zhang, P. (2016). A Survey of Homogeneous and Heterogeneous System Architectures in High Performance Computing. 2016 IEEE International Conference on Smart Cloud (SmartCloud), 8-20 Nov. (pp. 170-175). New York, NY, USA: IEEE. doi:10.1109/SmartCloud.2016.36 Girden, E. R. (1992). ANOVA: repeated measures. Newbury Park, CA, USA: Sage, University Paper Serires on Quantitativer Aplications in the Social Sciences, Series 07-084. doi:10.4135/9781412983419 Gonzalez, J., & Núñez, R. C. (2009, July 1). LAPACKrc: Fast linear algebra kernels/solvers for FPGA accelerators. Journal of Physics: Conference Series, SciDAC 2009, 14–18 June, 180(1, 012042). doi:10.1088/1742-6596/180/1/012042 Kestur, S., Davis, J. D., & Chung, E. S. (2012). Towards a Universal FPGA Matrix-Vector Multiplication Architecture. 2012 IEEE 20th International Symposium on Field- Programmable Custom Computing Machines, 29 April - 1 May (pp. 9-16). Toronto, ON, Canada: IEEE. doi:10.1109/FCCM.2012.12 Khronos Group. (2024). OPenCL: Open Standard for Parallel Programming of Heterogeneous Systems. Retrieved from Khronos Group website: https://www.khronos.org/opencl/ Khronos Group. (2024). SYCL: C++ Programming for Heterogeneous Parallel Computing. Retrieved from Khronos® Group website: https://www.khronos.org/api/index_2017/sycl Kuon, I., Tessier, R., & Rose, J. (2008). FPGA Architecture: Survey and Challenges. Foundations and Trends in Electronic Design Automation, 2(2), 135-253. doi:10.1561/1000000005 Lawson, C. L., Hanson, R. J., Kincaid, D. R., & Krogh, F. T. (1979, September). Basic Linear Algebra Subprograms for Fortran Usage. (J. R. Rice, Ed.) ACM Transactions on Mathematical Software (TOMS), 5(3), 308–323. doi:10.1145/355841.355847 NVIDIA Corporation. (2024). CUDA Toolkit. Retrieved from NVIDIA Developer website: https://developer.nvidia.com/cuda-toolkit OpenMP. (2024). OpenMP: The OpenMP API specification for parallel programming. Retrieved from OpenMP website: https://www.openmp.org/ Podobas, A. (2014). Accelerating Parallel Computations with OpenMP-Driven System-on-Chip Generation for FPGAs. Proceedings of the 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, MCSOC '14, September 23 - 25 (pp. 149-156). Washington, DC, USA: IEEE. doi:10.1109/MCSoC.2014.30 Sommer, L., Korinth, J., & Koch, A. (2017). OpenMP device offloading to FPGA accelerators. 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 10-12 July (pp. 201-205). Seattle, WA, USA: IEEE. doi:10.1109/ASAP.2017.7995280 Steffenel, L. A. (2019). HPC challenges for the next years: the rising of heterogeneity and its impact on simulations. CECAM Workshop: Microscopic simulations: forecasting the next two decades, April 24-26 (pp. 1-25). Toulouse, France: CECAM - Centre Européen de Calcul Atomique et Moléculaire. Retrieved from https://hal.univ-reims.fr/hal-02120029 Sun, J., Peterson, G. D., & Storaasli, O. (2007). Mapping Sparse Matrix-Vector Multiplication on FPGAs. Proceedings of the Third Annual Reconfigurable Systems Summer Institute (RSSI'07), July 17-20 (pp. 1-10). Urbana, Illinois, USA: RSSI. Retrieved from http://rssi.ncsa.illinois.edu/2007/proceedings/papers/rssi07_12_paper.pdf Townsend, K. R. (2016). Computing SpMV on FPGAs. PhD Thesis, Iowa State University, Electrical and Computer Engineering, Ames, Iowa. doi:10.31274/etd-180810-4826 Tsoi, K. H., & Luk, W. (2010). Axel: a heterogeneous cluster with FPGAs and GPUs. Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '10, Monterey, California, USA, February 21 - 23 (pp. 115–124). New York, NY, USA: Association for Computing Machinery. doi:10.1145/1723112.1723134 Zhang, Z., Fan, Y., Jiang, W., Han, G., Yang, C., & Jason, C. (2008). AutoPilot: A Platform-Based ESL Synthesis System. In P. Coussy, & A. Morawiec (Eds.), High-Level Synthesis: From Algorithm to Digital Circuit (pp. 99-112). Dordrecht, Netherlands: Springer. doi:10.1007/978-1-4020- 8588-8_6 Zhuo, L., & Prasanna, V. K. (2005). High Performance Linear Algebra Operations on Reconfigurable Systems. In SC '05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, 12- 18 November (p. 2). Seattle, WA, USA: IEEE. doi:10.1109/SC.2005.31 |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
rights_invalid_str_mv |
http://purl.org/coar/access_right/c_abf2 |
dc.format.mimetype.spa.fl_str_mv |
application/pdf |
dc.publisher.spa.fl_str_mv |
Universidad Autónoma de Bucaramanga UNAB |
dc.source.spa.fl_str_mv |
Vol. 25 Núm. 2 (2024): Revista Colombiana de Computación (Julio-Diciembre); 43-58 |
institution |
Universidad Autónoma de Bucaramanga - UNAB |
bitstream.url.fl_str_mv |
https://repository.unab.edu.co/bitstream/20.500.12749/28300/1/Articulo%205.pdf https://repository.unab.edu.co/bitstream/20.500.12749/28300/2/license.txt https://repository.unab.edu.co/bitstream/20.500.12749/28300/3/Articulo%205.pdf.jpg |
bitstream.checksum.fl_str_mv |
672c5de0081db2f9cdca0fcaf82999cd 855f7d18ea80f5df821f7004dff2f316 1968381a0286a1160700d26c6e3df0d3 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositorio Institucional | Universidad Autónoma de Bucaramanga - UNAB |
repository.mail.fl_str_mv |
repositorio@unab.edu.co |
_version_ |
1828219784825667584 |
spelling |
Morales Peña, Alejandrof887aee6-6705-403a-b5d6-e0f0658678dfMeneses, Esteban4a20e5ac-8885-4884-8394-47d2c557f95fMorales Peña, Alejandro [0009-0004-9508-4266]Meneses, Esteban [0000-0002-4307-6000]2025-02-14T13:52:06Z2025-02-14T13:52:06Z2024-06-181657-28312539-2115http://hdl.handle.net/20.500.12749/28300instname:Universidad Autónoma de Bucaramanga UNABrepourl:https://repository.unab.edu.cohttps://doi.org/10.29375/25392115.5276Aunque los sistemas heterogéneos basados en aceleradores hardware son un tópico de tendencia en la comunidad HPC, explorar las ventajas y desventajas de los basados en hardware reconfigurable en librerías de álgebra lineal para sistemas de alto rendimiento, no ha sido estudiado en profundidad. Por ello, en esta investigación, nuestro objetivo es aprovechar la capacidad de reconfiguración, adaptabilidad y reducción del consumo de energía de las FPGAs para generar kernels basados en FPGAs en Ginkgo, una librería especializada de álgebra lineal de alto rendimiento para sistemas multinúcleo. Generamos 3 kernels basados en FPGA para los formatos CSR, SELLP y SELL SpMV, y obtuvimos aumentos de velocidad de al menos 10 veces respecto a los kernels basados en CPU. Además, demostramos mediante un estudio de caracterización del rendimiento que las FPGA superan a los procesadores de propósito general en términos de tiempo de cálculo.Although heterogeneous systems based on hardware accelerators are a trending topic in the HPC community, exploring the trade-offs of reconfigurable hardware-based ones in linear algebra libraries for high-performance systems, has not been deeply studied. Therefore, in this research, we aim to take advantage of FPGAs' reconfigurability, adaptability, and capacity to reduce power consumption to generate FPGA-based kernels in Ginkgo, a specialized high-performance linear algebra library for many-core systems. We generated 3 FPGA-based kernels for the CSR, SELLP, and SELL SpMV formats, and obtained speedups of at least 10x concerning CPU-based kernels. Furthermore, we demonstrated via a performance characterization study that FPGAs outperform general-purpose processors in terms of compute time.application/pdfspaUniversidad Autónoma de Bucaramanga UNABhttps://revistas.unab.edu.co/index.php/rcc/article/view/5276/4086https://revistas.unab.edu.co/index.php/rcc/issue/view/303AMD. (2022a, August 4). Heterogeneous Accelerated Compute Cluster (HACC) Program. (Advanced Micro Devices, Inc) Retrieved 2023, from AMD Website: https://www.amd.com/en/corporate/university-program/aup-hacc.htmlAMD. (2022b, October 7). XRT Native APIs. (Advanced Micro Devices, Inc) Retrieved 2023, from https://xilinx.github.io/XRT/master/html/xrt_native_apis.htmlAMD. (2023). ROCm Software 5.3.0: HIP Documentation. (Advanced Micro Devices, Inc) Retrieved 2024, from AMD website: https://rocm.docs.amd.com/projects/HIP/en/docs-5.3.0/index.htmlAMD. (2024, May 15). AMD. (Advanced Micro Devices, Inc) Retrieved 2024, from AMD Website: https://www.amd.com/en.htmlAnderson, E., Bai, Z., Bischof, C., Blackford, L. S., Demmel, J., Dongarra, J., . . . Sorensen, D. (1999). LAPACK Users' Guide (Third ed.). Philadelphia, USA: SIAM. doi:10.1137/1.9780898719604Anzt, H., Cojean, T., Chen, Y.-C., Flegar, G., Göbel, F., Grützmacher, T., . . . Tsai, Y.-H. (2020). Ginkgo: A high performance numerical linear algebra library. Journal of Open Source Software, 5(52), 1-6, 2260. doi:10.21105/joss.02260Anzt, H., Cojean, T., Flegar, G., Göbel, F., Grützmacher, T., Nayak, P., . . . Quintana-Ortí, E. S. (2022, March). Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing. (Z. Bai, & W. Bangerth, Eds.) ACM Transactions on Mathematical Software (TOMS), 48(1), 1-33, Article No. 2. doi:10.1145/3480935Anzt, H., Tomov, S., & Dongarra, J. (2014, April). Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-formats on NVIDIA GPUs. Technical Report UT-EECS-14-727, University of Tennessee. Retrieved from https://icl.utk.edu/files/publications/2014/icl-utk-772- 2014.pdfBosch, J., Tan, X., Filgueras, A., Vidal, M., Mateu, M., Jiménez-González, D., . . . Labarta, J. (2018). Application Acceleration on FPGAs with OmpSs@FPGA. In 2018 International Conference on Field-Programmable Technology (FPT), Naha, Japan, 10-14 Dec. (pp. 70-77). IEEE. doi:10.1109/FPT.2018.00021BSC. (2016). Linear Algebra and Math Libraries. (Barcelona Supercomputing Center) Retrieved 2023, from BSC website: https://www.bsc.es/research-development/research-areas/programmingmodels/ linear-algebra-and-math-librariesCppreference. (2024, October 4). RAII. Retrieved from Cppreference website: https://en.cppreference.com/w/cpp/language/raiiDavis, T. A., & Hu, Y. (2011, November). The university of Florida sparse matrix collection. ACM Transactions on Mathematical Software, 38(1), 1-25, Article 1. doi:10.1145/2049662.2049663De Matteis, T., de Fine Licht, J., & Hoefler, T. (2020). fBLAS: streaming linear algebra on FPGA. SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 9 - 19 (pp. 1-13, Article 59). Atlanta, Georgia, USA: IEEE. doi:10.5555/3433701.3433779Dongarra, J. J., & Walker, D. W. (1995). Software Libraries for Linear Algebra Computations on High Performance Computers. SIAM Review, 37(2), 151-180. doi:10.1137/1037042Dongarra, J., & Blackford, L. S. (1996). ScaLAPACK tutorial. In J. Waśniewski, J. Dongarra, K. Madsen, & D. Olesen, Applied Parallel Computing Industrial Computation and Optimization. Third International Workshop, PARA 1996, Lyngby, Denmark, August 18-21. Lecture Notes in Computer Science (Vol. 1184, pp. 204–215). Berlin, Heidelberg, Germany: Springer. doi:10.1007/3-540-62095-8_22ETH Zürich. (2024). ETH Zürich. Retrieved from ETH website: https://ethz.ch/de.htmlFang, J., Mulder, Y. B., Hidders, J., Lee, J., & Hofstee, H. P. (2020, January). In-memory database acceleration on FPGAs: a survey. The VLDB Journal, 29(1), 33–59. doi:10.1007/s00778-019- 00581-wGao, Y., & Zhang, P. (2016). A Survey of Homogeneous and Heterogeneous System Architectures in High Performance Computing. 2016 IEEE International Conference on Smart Cloud (SmartCloud), 8-20 Nov. (pp. 170-175). New York, NY, USA: IEEE. doi:10.1109/SmartCloud.2016.36Girden, E. R. (1992). ANOVA: repeated measures. Newbury Park, CA, USA: Sage, University Paper Serires on Quantitativer Aplications in the Social Sciences, Series 07-084. doi:10.4135/9781412983419Gonzalez, J., & Núñez, R. C. (2009, July 1). LAPACKrc: Fast linear algebra kernels/solvers for FPGA accelerators. Journal of Physics: Conference Series, SciDAC 2009, 14–18 June, 180(1, 012042). doi:10.1088/1742-6596/180/1/012042Kestur, S., Davis, J. D., & Chung, E. S. (2012). Towards a Universal FPGA Matrix-Vector Multiplication Architecture. 2012 IEEE 20th International Symposium on Field- Programmable Custom Computing Machines, 29 April - 1 May (pp. 9-16). Toronto, ON, Canada: IEEE. doi:10.1109/FCCM.2012.12Khronos Group. (2024). OPenCL: Open Standard for Parallel Programming of Heterogeneous Systems. Retrieved from Khronos Group website: https://www.khronos.org/opencl/Khronos Group. (2024). SYCL: C++ Programming for Heterogeneous Parallel Computing. Retrieved from Khronos® Group website: https://www.khronos.org/api/index_2017/syclKuon, I., Tessier, R., & Rose, J. (2008). FPGA Architecture: Survey and Challenges. Foundations and Trends in Electronic Design Automation, 2(2), 135-253. doi:10.1561/1000000005Lawson, C. L., Hanson, R. J., Kincaid, D. R., & Krogh, F. T. (1979, September). Basic Linear Algebra Subprograms for Fortran Usage. (J. R. Rice, Ed.) ACM Transactions on Mathematical Software (TOMS), 5(3), 308–323. doi:10.1145/355841.355847NVIDIA Corporation. (2024). CUDA Toolkit. Retrieved from NVIDIA Developer website: https://developer.nvidia.com/cuda-toolkitOpenMP. (2024). OpenMP: The OpenMP API specification for parallel programming. Retrieved from OpenMP website: https://www.openmp.org/Podobas, A. (2014). Accelerating Parallel Computations with OpenMP-Driven System-on-Chip Generation for FPGAs. Proceedings of the 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, MCSOC '14, September 23 - 25 (pp. 149-156). Washington, DC, USA: IEEE. doi:10.1109/MCSoC.2014.30Sommer, L., Korinth, J., & Koch, A. (2017). OpenMP device offloading to FPGA accelerators. 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 10-12 July (pp. 201-205). Seattle, WA, USA: IEEE. doi:10.1109/ASAP.2017.7995280Steffenel, L. A. (2019). HPC challenges for the next years: the rising of heterogeneity and its impact on simulations. CECAM Workshop: Microscopic simulations: forecasting the next two decades, April 24-26 (pp. 1-25). Toulouse, France: CECAM - Centre Européen de Calcul Atomique et Moléculaire. Retrieved from https://hal.univ-reims.fr/hal-02120029Sun, J., Peterson, G. D., & Storaasli, O. (2007). Mapping Sparse Matrix-Vector Multiplication on FPGAs. Proceedings of the Third Annual Reconfigurable Systems Summer Institute (RSSI'07), July 17-20 (pp. 1-10). Urbana, Illinois, USA: RSSI. Retrieved from http://rssi.ncsa.illinois.edu/2007/proceedings/papers/rssi07_12_paper.pdfTownsend, K. R. (2016). Computing SpMV on FPGAs. PhD Thesis, Iowa State University, Electrical and Computer Engineering, Ames, Iowa. doi:10.31274/etd-180810-4826Tsoi, K. H., & Luk, W. (2010). Axel: a heterogeneous cluster with FPGAs and GPUs. Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '10, Monterey, California, USA, February 21 - 23 (pp. 115–124). New York, NY, USA: Association for Computing Machinery. doi:10.1145/1723112.1723134Zhang, Z., Fan, Y., Jiang, W., Han, G., Yang, C., & Jason, C. (2008). AutoPilot: A Platform-Based ESL Synthesis System. In P. Coussy, & A. Morawiec (Eds.), High-Level Synthesis: From Algorithm to Digital Circuit (pp. 99-112). Dordrecht, Netherlands: Springer. doi:10.1007/978-1-4020- 8588-8_6Zhuo, L., & Prasanna, V. K. (2005). High Performance Linear Algebra Operations on Reconfigurable Systems. In SC '05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, 12- 18 November (p. 2). Seattle, WA, USA: IEEE. doi:10.1109/SC.2005.31Vol. 25 Núm. 2 (2024): Revista Colombiana de Computación (Julio-Diciembre); 43-58Computación de Alto RendimientoGinkgoFPGAsSpMVHPCGinkgoFPGAsSpMVExpansión de Ginkgo para administrar kernels reconfigurables basados en hardwareExtending Ginkgo to Manage Reconfigurable Hardware-Based Kernelsinfo:eu-repo/semantics/articleArtículohttp://purl.org/coar/resource_type/c_2df8fbb1http://purl.org/redcol/resource_type/ARThttp://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/access_right/c_abf2ORIGINALArticulo 5.pdfArticulo 5.pdfArtículoapplication/pdf879859https://repository.unab.edu.co/bitstream/20.500.12749/28300/1/Articulo%205.pdf672c5de0081db2f9cdca0fcaf82999cdMD51open accessLICENSElicense.txtlicense.txttext/plain; charset=utf-8347https://repository.unab.edu.co/bitstream/20.500.12749/28300/2/license.txt855f7d18ea80f5df821f7004dff2f316MD52open accessTHUMBNAILArticulo 5.pdf.jpgArticulo 5.pdf.jpgIM Thumbnailimage/jpeg9727https://repository.unab.edu.co/bitstream/20.500.12749/28300/3/Articulo%205.pdf.jpg1968381a0286a1160700d26c6e3df0d3MD53open access20.500.12749/28300oai:repository.unab.edu.co:20.500.12749/283002025-02-14 22:00:30.937open accessRepositorio Institucional | Universidad Autónoma de Bucaramanga - UNABrepositorio@unab.edu.coTGEgUmV2aXN0YSBDb2xvbWJpYW5hIGRlIENvbXB1dGFjacOzbiBlcyBmaW5hbmNpYWRhIHBvciBsYSBVbml2ZXJzaWRhZCBBdXTDs25vbWEgZGUgQnVjYXJhbWFuZ2EuIEVzdGEgUmV2aXN0YSBubyBjb2JyYSB0YXNhIGRlIHN1bWlzacOzbiB5IHB1YmxpY2FjacOzbiBkZSBhcnTDrWN1bG9zLiBQcm92ZWUgYWNjZXNvIGxpYnJlIGlubWVkaWF0byBhIHN1IGNvbnRlbmlkbyBiYWpvIGVsIHByaW5jaXBpbyBkZSBxdWUgaGFjZXIgZGlzcG9uaWJsZSBncmF0dWl0YW1lbnRlIGludmVzdGlnYWNpw7NuIGFsIHDDumJsaWNvIGFwb3lhIGEgdW4gbWF5b3IgaW50ZXJjYW1iaW8gZGUgY29ub2NpbWllbnRvIGdsb2JhbC4= |