Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware

Aunque los sistemas heterogéneos basados en aceleradores hardware son un tópico de tendencia en la comunidad HPC, explorar las ventajas y desventajas de los basados en hardware reconfigurable en librerías de álgebra lineal para sistemas de alto rendimiento, no ha sido estudiado en profundidad. Por e...

Full description

Autores:
Morales Peña, Alejandro
Meneses, Esteban
Tipo de recurso:
Article of investigation
Fecha de publicación:
2024
Institución:
Universidad Autónoma de Bucaramanga - UNAB
Repositorio:
Repositorio UNAB
Idioma:
spa
OAI Identifier:
oai:repository.unab.edu.co:20.500.12749/28300
Acceso en línea:
http://hdl.handle.net/20.500.12749/28300
https://doi.org/10.29375/25392115.5276
Palabra clave:
Computación de Alto Rendimiento
Ginkgo
FPGAs
SpMV
HPC
Ginkgo
FPGAs
SpMV
Rights
License
http://purl.org/coar/access_right/c_abf2
id UNAB2_b3f1ff13f97d11f009ac8c77344393fd
oai_identifier_str oai:repository.unab.edu.co:20.500.12749/28300
network_acronym_str UNAB2
network_name_str Repositorio UNAB
repository_id_str
dc.title.spa.fl_str_mv Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware
dc.title.translated.eng.fl_str_mv Extending Ginkgo to Manage Reconfigurable Hardware-Based Kernels
title Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware
spellingShingle Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware
Computación de Alto Rendimiento
Ginkgo
FPGAs
SpMV
HPC
Ginkgo
FPGAs
SpMV
title_short Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware
title_full Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware
title_fullStr Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware
title_full_unstemmed Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware
title_sort Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware
dc.creator.fl_str_mv Morales Peña, Alejandro
Meneses, Esteban
dc.contributor.author.none.fl_str_mv Morales Peña, Alejandro
Meneses, Esteban
dc.contributor.orcid.spa.fl_str_mv Morales Peña, Alejandro [0009-0004-9508-4266]
Meneses, Esteban [0000-0002-4307-6000]
dc.subject.spa.fl_str_mv Computación de Alto Rendimiento
Ginkgo
FPGAs
SpMV
topic Computación de Alto Rendimiento
Ginkgo
FPGAs
SpMV
HPC
Ginkgo
FPGAs
SpMV
dc.subject.keywords.eng.fl_str_mv HPC
Ginkgo
FPGAs
SpMV
description Aunque los sistemas heterogéneos basados en aceleradores hardware son un tópico de tendencia en la comunidad HPC, explorar las ventajas y desventajas de los basados en hardware reconfigurable en librerías de álgebra lineal para sistemas de alto rendimiento, no ha sido estudiado en profundidad. Por ello, en esta investigación, nuestro objetivo es aprovechar la capacidad de reconfiguración, adaptabilidad y reducción del consumo de energía de las FPGAs para generar kernels basados en FPGAs en Ginkgo, una librería especializada de álgebra lineal de alto rendimiento para sistemas multinúcleo. Generamos 3 kernels basados en FPGA para los formatos CSR, SELLP y SELL SpMV, y obtuvimos aumentos de velocidad de al menos 10 veces respecto a los kernels basados en CPU. Además, demostramos mediante un estudio de caracterización del rendimiento que las FPGA superan a los procesadores de propósito general en términos de tiempo de cálculo.
publishDate 2024
dc.date.issued.none.fl_str_mv 2024-06-18
dc.date.accessioned.none.fl_str_mv 2025-02-14T13:52:06Z
dc.date.available.none.fl_str_mv 2025-02-14T13:52:06Z
dc.type.coarversion.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.driver.none.fl_str_mv info:eu-repo/semantics/article
dc.type.local.spa.fl_str_mv Artículo
dc.type.coar.none.fl_str_mv http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.redcol.none.fl_str_mv http://purl.org/redcol/resource_type/ART
format http://purl.org/coar/resource_type/c_2df8fbb1
dc.identifier.issn.spa.fl_str_mv 1657-2831
2539-2115
dc.identifier.uri.none.fl_str_mv http://hdl.handle.net/20.500.12749/28300
dc.identifier.instname.spa.fl_str_mv instname:Universidad Autónoma de Bucaramanga UNAB
dc.identifier.repourl.spa.fl_str_mv repourl:https://repository.unab.edu.co
dc.identifier.doi.none.fl_str_mv https://doi.org/10.29375/25392115.5276
identifier_str_mv 1657-2831
2539-2115
instname:Universidad Autónoma de Bucaramanga UNAB
repourl:https://repository.unab.edu.co
url http://hdl.handle.net/20.500.12749/28300
https://doi.org/10.29375/25392115.5276
dc.language.iso.spa.fl_str_mv spa
language spa
dc.relation.spa.fl_str_mv https://revistas.unab.edu.co/index.php/rcc/article/view/5276/4086
dc.relation.uri.spa.fl_str_mv https://revistas.unab.edu.co/index.php/rcc/issue/view/303
dc.relation.references.none.fl_str_mv AMD. (2022a, August 4). Heterogeneous Accelerated Compute Cluster (HACC) Program. (Advanced Micro Devices, Inc) Retrieved 2023, from AMD Website: https://www.amd.com/en/corporate/university-program/aup-hacc.html
AMD. (2022b, October 7). XRT Native APIs. (Advanced Micro Devices, Inc) Retrieved 2023, from https://xilinx.github.io/XRT/master/html/xrt_native_apis.html
AMD. (2023). ROCm Software 5.3.0: HIP Documentation. (Advanced Micro Devices, Inc) Retrieved 2024, from AMD website: https://rocm.docs.amd.com/projects/HIP/en/docs-5.3.0/index.html
AMD. (2024, May 15). AMD. (Advanced Micro Devices, Inc) Retrieved 2024, from AMD Website: https://www.amd.com/en.html
Anderson, E., Bai, Z., Bischof, C., Blackford, L. S., Demmel, J., Dongarra, J., . . . Sorensen, D. (1999). LAPACK Users' Guide (Third ed.). Philadelphia, USA: SIAM. doi:10.1137/1.9780898719604
Anzt, H., Cojean, T., Chen, Y.-C., Flegar, G., Göbel, F., Grützmacher, T., . . . Tsai, Y.-H. (2020). Ginkgo: A high performance numerical linear algebra library. Journal of Open Source Software, 5(52), 1-6, 2260. doi:10.21105/joss.02260
Anzt, H., Cojean, T., Flegar, G., Göbel, F., Grützmacher, T., Nayak, P., . . . Quintana-Ortí, E. S. (2022, March). Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing. (Z. Bai, & W. Bangerth, Eds.) ACM Transactions on Mathematical Software (TOMS), 48(1), 1-33, Article No. 2. doi:10.1145/3480935
Anzt, H., Tomov, S., & Dongarra, J. (2014, April). Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-formats on NVIDIA GPUs. Technical Report UT-EECS-14-727, University of Tennessee. Retrieved from https://icl.utk.edu/files/publications/2014/icl-utk-772- 2014.pdf
Bosch, J., Tan, X., Filgueras, A., Vidal, M., Mateu, M., Jiménez-González, D., . . . Labarta, J. (2018). Application Acceleration on FPGAs with OmpSs@FPGA. In 2018 International Conference on Field-Programmable Technology (FPT), Naha, Japan, 10-14 Dec. (pp. 70-77). IEEE. doi:10.1109/FPT.2018.00021
BSC. (2016). Linear Algebra and Math Libraries. (Barcelona Supercomputing Center) Retrieved 2023, from BSC website: https://www.bsc.es/research-development/research-areas/programmingmodels/ linear-algebra-and-math-libraries
Cppreference. (2024, October 4). RAII. Retrieved from Cppreference website: https://en.cppreference.com/w/cpp/language/raii
Davis, T. A., & Hu, Y. (2011, November). The university of Florida sparse matrix collection. ACM Transactions on Mathematical Software, 38(1), 1-25, Article 1. doi:10.1145/2049662.2049663
De Matteis, T., de Fine Licht, J., & Hoefler, T. (2020). fBLAS: streaming linear algebra on FPGA. SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 9 - 19 (pp. 1-13, Article 59). Atlanta, Georgia, USA: IEEE. doi:10.5555/3433701.3433779
Dongarra, J. J., & Walker, D. W. (1995). Software Libraries for Linear Algebra Computations on High Performance Computers. SIAM Review, 37(2), 151-180. doi:10.1137/1037042
Dongarra, J., & Blackford, L. S. (1996). ScaLAPACK tutorial. In J. Waśniewski, J. Dongarra, K. Madsen, & D. Olesen, Applied Parallel Computing Industrial Computation and Optimization. Third International Workshop, PARA 1996, Lyngby, Denmark, August 18-21. Lecture Notes in Computer Science (Vol. 1184, pp. 204–215). Berlin, Heidelberg, Germany: Springer. doi:10.1007/3-540-62095-8_22
ETH Zürich. (2024). ETH Zürich. Retrieved from ETH website: https://ethz.ch/de.html
Fang, J., Mulder, Y. B., Hidders, J., Lee, J., & Hofstee, H. P. (2020, January). In-memory database acceleration on FPGAs: a survey. The VLDB Journal, 29(1), 33–59. doi:10.1007/s00778-019- 00581-w
Gao, Y., & Zhang, P. (2016). A Survey of Homogeneous and Heterogeneous System Architectures in High Performance Computing. 2016 IEEE International Conference on Smart Cloud (SmartCloud), 8-20 Nov. (pp. 170-175). New York, NY, USA: IEEE. doi:10.1109/SmartCloud.2016.36
Girden, E. R. (1992). ANOVA: repeated measures. Newbury Park, CA, USA: Sage, University Paper Serires on Quantitativer Aplications in the Social Sciences, Series 07-084. doi:10.4135/9781412983419
Gonzalez, J., & Núñez, R. C. (2009, July 1). LAPACKrc: Fast linear algebra kernels/solvers for FPGA accelerators. Journal of Physics: Conference Series, SciDAC 2009, 14–18 June, 180(1, 012042). doi:10.1088/1742-6596/180/1/012042
Kestur, S., Davis, J. D., & Chung, E. S. (2012). Towards a Universal FPGA Matrix-Vector Multiplication Architecture. 2012 IEEE 20th International Symposium on Field- Programmable Custom Computing Machines, 29 April - 1 May (pp. 9-16). Toronto, ON, Canada: IEEE. doi:10.1109/FCCM.2012.12
Khronos Group. (2024). OPenCL: Open Standard for Parallel Programming of Heterogeneous Systems. Retrieved from Khronos Group website: https://www.khronos.org/opencl/
Khronos Group. (2024). SYCL: C++ Programming for Heterogeneous Parallel Computing. Retrieved from Khronos® Group website: https://www.khronos.org/api/index_2017/sycl
Kuon, I., Tessier, R., & Rose, J. (2008). FPGA Architecture: Survey and Challenges. Foundations and Trends in Electronic Design Automation, 2(2), 135-253. doi:10.1561/1000000005
Lawson, C. L., Hanson, R. J., Kincaid, D. R., & Krogh, F. T. (1979, September). Basic Linear Algebra Subprograms for Fortran Usage. (J. R. Rice, Ed.) ACM Transactions on Mathematical Software (TOMS), 5(3), 308–323. doi:10.1145/355841.355847
NVIDIA Corporation. (2024). CUDA Toolkit. Retrieved from NVIDIA Developer website: https://developer.nvidia.com/cuda-toolkit
OpenMP. (2024). OpenMP: The OpenMP API specification for parallel programming. Retrieved from OpenMP website: https://www.openmp.org/
Podobas, A. (2014). Accelerating Parallel Computations with OpenMP-Driven System-on-Chip Generation for FPGAs. Proceedings of the 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, MCSOC '14, September 23 - 25 (pp. 149-156). Washington, DC, USA: IEEE. doi:10.1109/MCSoC.2014.30
Sommer, L., Korinth, J., & Koch, A. (2017). OpenMP device offloading to FPGA accelerators. 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 10-12 July (pp. 201-205). Seattle, WA, USA: IEEE. doi:10.1109/ASAP.2017.7995280
Steffenel, L. A. (2019). HPC challenges for the next years: the rising of heterogeneity and its impact on simulations. CECAM Workshop: Microscopic simulations: forecasting the next two decades, April 24-26 (pp. 1-25). Toulouse, France: CECAM - Centre Européen de Calcul Atomique et Moléculaire. Retrieved from https://hal.univ-reims.fr/hal-02120029
Sun, J., Peterson, G. D., & Storaasli, O. (2007). Mapping Sparse Matrix-Vector Multiplication on FPGAs. Proceedings of the Third Annual Reconfigurable Systems Summer Institute (RSSI'07), July 17-20 (pp. 1-10). Urbana, Illinois, USA: RSSI. Retrieved from http://rssi.ncsa.illinois.edu/2007/proceedings/papers/rssi07_12_paper.pdf
Townsend, K. R. (2016). Computing SpMV on FPGAs. PhD Thesis, Iowa State University, Electrical and Computer Engineering, Ames, Iowa. doi:10.31274/etd-180810-4826
Tsoi, K. H., & Luk, W. (2010). Axel: a heterogeneous cluster with FPGAs and GPUs. Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '10, Monterey, California, USA, February 21 - 23 (pp. 115–124). New York, NY, USA: Association for Computing Machinery. doi:10.1145/1723112.1723134
Zhang, Z., Fan, Y., Jiang, W., Han, G., Yang, C., & Jason, C. (2008). AutoPilot: A Platform-Based ESL Synthesis System. In P. Coussy, & A. Morawiec (Eds.), High-Level Synthesis: From Algorithm to Digital Circuit (pp. 99-112). Dordrecht, Netherlands: Springer. doi:10.1007/978-1-4020- 8588-8_6
Zhuo, L., & Prasanna, V. K. (2005). High Performance Linear Algebra Operations on Reconfigurable Systems. In SC '05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, 12- 18 November (p. 2). Seattle, WA, USA: IEEE. doi:10.1109/SC.2005.31
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv http://purl.org/coar/access_right/c_abf2
dc.format.mimetype.spa.fl_str_mv application/pdf
dc.publisher.spa.fl_str_mv Universidad Autónoma de Bucaramanga UNAB
dc.source.spa.fl_str_mv Vol. 25 Núm. 2 (2024): Revista Colombiana de Computación (Julio-Diciembre); 43-58
institution Universidad Autónoma de Bucaramanga - UNAB
bitstream.url.fl_str_mv https://repository.unab.edu.co/bitstream/20.500.12749/28300/1/Articulo%205.pdf
https://repository.unab.edu.co/bitstream/20.500.12749/28300/2/license.txt
https://repository.unab.edu.co/bitstream/20.500.12749/28300/3/Articulo%205.pdf.jpg
bitstream.checksum.fl_str_mv 672c5de0081db2f9cdca0fcaf82999cd
855f7d18ea80f5df821f7004dff2f316
1968381a0286a1160700d26c6e3df0d3
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositorio Institucional | Universidad Autónoma de Bucaramanga - UNAB
repository.mail.fl_str_mv repositorio@unab.edu.co
_version_ 1828219784825667584
spelling Morales Peña, Alejandrof887aee6-6705-403a-b5d6-e0f0658678dfMeneses, Esteban4a20e5ac-8885-4884-8394-47d2c557f95fMorales Peña, Alejandro [0009-0004-9508-4266]Meneses, Esteban [0000-0002-4307-6000]2025-02-14T13:52:06Z2025-02-14T13:52:06Z2024-06-181657-28312539-2115http://hdl.handle.net/20.500.12749/28300instname:Universidad Autónoma de Bucaramanga UNABrepourl:https://repository.unab.edu.cohttps://doi.org/10.29375/25392115.5276Aunque los sistemas heterogéneos basados en aceleradores hardware son un tópico de tendencia en la comunidad HPC, explorar las ventajas y desventajas de los basados en hardware reconfigurable en librerías de álgebra lineal para sistemas de alto rendimiento, no ha sido estudiado en profundidad. Por ello, en esta investigación, nuestro objetivo es aprovechar la capacidad de reconfiguración, adaptabilidad y reducción del consumo de energía de las FPGAs para generar kernels basados en FPGAs en Ginkgo, una librería especializada de álgebra lineal de alto rendimiento para sistemas multinúcleo. Generamos 3 kernels basados en FPGA para los formatos CSR, SELLP y SELL SpMV, y obtuvimos aumentos de velocidad de al menos 10 veces respecto a los kernels basados en CPU. Además, demostramos mediante un estudio de caracterización del rendimiento que las FPGA superan a los procesadores de propósito general en términos de tiempo de cálculo.Although heterogeneous systems based on hardware accelerators are a trending topic in the HPC community, exploring the trade-offs of reconfigurable hardware-based ones in linear algebra libraries for high-performance systems, has not been deeply studied. Therefore, in this research, we aim to take advantage of FPGAs' reconfigurability, adaptability, and capacity to reduce power consumption to generate FPGA-based kernels in Ginkgo, a specialized high-performance linear algebra library for many-core systems. We generated 3 FPGA-based kernels for the CSR, SELLP, and SELL SpMV formats, and obtained speedups of at least 10x concerning CPU-based kernels. Furthermore, we demonstrated via a performance characterization study that FPGAs outperform general-purpose processors in terms of compute time.application/pdfspaUniversidad Autónoma de Bucaramanga UNABhttps://revistas.unab.edu.co/index.php/rcc/article/view/5276/4086https://revistas.unab.edu.co/index.php/rcc/issue/view/303AMD. (2022a, August 4). Heterogeneous Accelerated Compute Cluster (HACC) Program. (Advanced Micro Devices, Inc) Retrieved 2023, from AMD Website: https://www.amd.com/en/corporate/university-program/aup-hacc.htmlAMD. (2022b, October 7). XRT Native APIs. (Advanced Micro Devices, Inc) Retrieved 2023, from https://xilinx.github.io/XRT/master/html/xrt_native_apis.htmlAMD. (2023). ROCm Software 5.3.0: HIP Documentation. (Advanced Micro Devices, Inc) Retrieved 2024, from AMD website: https://rocm.docs.amd.com/projects/HIP/en/docs-5.3.0/index.htmlAMD. (2024, May 15). AMD. (Advanced Micro Devices, Inc) Retrieved 2024, from AMD Website: https://www.amd.com/en.htmlAnderson, E., Bai, Z., Bischof, C., Blackford, L. S., Demmel, J., Dongarra, J., . . . Sorensen, D. (1999). LAPACK Users' Guide (Third ed.). Philadelphia, USA: SIAM. doi:10.1137/1.9780898719604Anzt, H., Cojean, T., Chen, Y.-C., Flegar, G., Göbel, F., Grützmacher, T., . . . Tsai, Y.-H. (2020). Ginkgo: A high performance numerical linear algebra library. Journal of Open Source Software, 5(52), 1-6, 2260. doi:10.21105/joss.02260Anzt, H., Cojean, T., Flegar, G., Göbel, F., Grützmacher, T., Nayak, P., . . . Quintana-Ortí, E. S. (2022, March). Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing. (Z. Bai, & W. Bangerth, Eds.) ACM Transactions on Mathematical Software (TOMS), 48(1), 1-33, Article No. 2. doi:10.1145/3480935Anzt, H., Tomov, S., & Dongarra, J. (2014, April). Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-formats on NVIDIA GPUs. Technical Report UT-EECS-14-727, University of Tennessee. Retrieved from https://icl.utk.edu/files/publications/2014/icl-utk-772- 2014.pdfBosch, J., Tan, X., Filgueras, A., Vidal, M., Mateu, M., Jiménez-González, D., . . . Labarta, J. (2018). Application Acceleration on FPGAs with OmpSs@FPGA. In 2018 International Conference on Field-Programmable Technology (FPT), Naha, Japan, 10-14 Dec. (pp. 70-77). IEEE. doi:10.1109/FPT.2018.00021BSC. (2016). Linear Algebra and Math Libraries. (Barcelona Supercomputing Center) Retrieved 2023, from BSC website: https://www.bsc.es/research-development/research-areas/programmingmodels/ linear-algebra-and-math-librariesCppreference. (2024, October 4). RAII. Retrieved from Cppreference website: https://en.cppreference.com/w/cpp/language/raiiDavis, T. A., & Hu, Y. (2011, November). The university of Florida sparse matrix collection. ACM Transactions on Mathematical Software, 38(1), 1-25, Article 1. doi:10.1145/2049662.2049663De Matteis, T., de Fine Licht, J., & Hoefler, T. (2020). fBLAS: streaming linear algebra on FPGA. SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 9 - 19 (pp. 1-13, Article 59). Atlanta, Georgia, USA: IEEE. doi:10.5555/3433701.3433779Dongarra, J. J., & Walker, D. W. (1995). Software Libraries for Linear Algebra Computations on High Performance Computers. SIAM Review, 37(2), 151-180. doi:10.1137/1037042Dongarra, J., & Blackford, L. S. (1996). ScaLAPACK tutorial. In J. Waśniewski, J. Dongarra, K. Madsen, & D. Olesen, Applied Parallel Computing Industrial Computation and Optimization. Third International Workshop, PARA 1996, Lyngby, Denmark, August 18-21. Lecture Notes in Computer Science (Vol. 1184, pp. 204–215). Berlin, Heidelberg, Germany: Springer. doi:10.1007/3-540-62095-8_22ETH Zürich. (2024). ETH Zürich. Retrieved from ETH website: https://ethz.ch/de.htmlFang, J., Mulder, Y. B., Hidders, J., Lee, J., & Hofstee, H. P. (2020, January). In-memory database acceleration on FPGAs: a survey. The VLDB Journal, 29(1), 33–59. doi:10.1007/s00778-019- 00581-wGao, Y., & Zhang, P. (2016). A Survey of Homogeneous and Heterogeneous System Architectures in High Performance Computing. 2016 IEEE International Conference on Smart Cloud (SmartCloud), 8-20 Nov. (pp. 170-175). New York, NY, USA: IEEE. doi:10.1109/SmartCloud.2016.36Girden, E. R. (1992). ANOVA: repeated measures. Newbury Park, CA, USA: Sage, University Paper Serires on Quantitativer Aplications in the Social Sciences, Series 07-084. doi:10.4135/9781412983419Gonzalez, J., & Núñez, R. C. (2009, July 1). LAPACKrc: Fast linear algebra kernels/solvers for FPGA accelerators. Journal of Physics: Conference Series, SciDAC 2009, 14–18 June, 180(1, 012042). doi:10.1088/1742-6596/180/1/012042Kestur, S., Davis, J. D., & Chung, E. S. (2012). Towards a Universal FPGA Matrix-Vector Multiplication Architecture. 2012 IEEE 20th International Symposium on Field- Programmable Custom Computing Machines, 29 April - 1 May (pp. 9-16). Toronto, ON, Canada: IEEE. doi:10.1109/FCCM.2012.12Khronos Group. (2024). OPenCL: Open Standard for Parallel Programming of Heterogeneous Systems. Retrieved from Khronos Group website: https://www.khronos.org/opencl/Khronos Group. (2024). SYCL: C++ Programming for Heterogeneous Parallel Computing. Retrieved from Khronos® Group website: https://www.khronos.org/api/index_2017/syclKuon, I., Tessier, R., & Rose, J. (2008). FPGA Architecture: Survey and Challenges. Foundations and Trends in Electronic Design Automation, 2(2), 135-253. doi:10.1561/1000000005Lawson, C. L., Hanson, R. J., Kincaid, D. R., & Krogh, F. T. (1979, September). Basic Linear Algebra Subprograms for Fortran Usage. (J. R. Rice, Ed.) ACM Transactions on Mathematical Software (TOMS), 5(3), 308–323. doi:10.1145/355841.355847NVIDIA Corporation. (2024). CUDA Toolkit. Retrieved from NVIDIA Developer website: https://developer.nvidia.com/cuda-toolkitOpenMP. (2024). OpenMP: The OpenMP API specification for parallel programming. Retrieved from OpenMP website: https://www.openmp.org/Podobas, A. (2014). Accelerating Parallel Computations with OpenMP-Driven System-on-Chip Generation for FPGAs. Proceedings of the 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, MCSOC '14, September 23 - 25 (pp. 149-156). Washington, DC, USA: IEEE. doi:10.1109/MCSoC.2014.30Sommer, L., Korinth, J., & Koch, A. (2017). OpenMP device offloading to FPGA accelerators. 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 10-12 July (pp. 201-205). Seattle, WA, USA: IEEE. doi:10.1109/ASAP.2017.7995280Steffenel, L. A. (2019). HPC challenges for the next years: the rising of heterogeneity and its impact on simulations. CECAM Workshop: Microscopic simulations: forecasting the next two decades, April 24-26 (pp. 1-25). Toulouse, France: CECAM - Centre Européen de Calcul Atomique et Moléculaire. Retrieved from https://hal.univ-reims.fr/hal-02120029Sun, J., Peterson, G. D., & Storaasli, O. (2007). Mapping Sparse Matrix-Vector Multiplication on FPGAs. Proceedings of the Third Annual Reconfigurable Systems Summer Institute (RSSI'07), July 17-20 (pp. 1-10). Urbana, Illinois, USA: RSSI. Retrieved from http://rssi.ncsa.illinois.edu/2007/proceedings/papers/rssi07_12_paper.pdfTownsend, K. R. (2016). Computing SpMV on FPGAs. PhD Thesis, Iowa State University, Electrical and Computer Engineering, Ames, Iowa. doi:10.31274/etd-180810-4826Tsoi, K. H., & Luk, W. (2010). Axel: a heterogeneous cluster with FPGAs and GPUs. Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '10, Monterey, California, USA, February 21 - 23 (pp. 115–124). New York, NY, USA: Association for Computing Machinery. doi:10.1145/1723112.1723134Zhang, Z., Fan, Y., Jiang, W., Han, G., Yang, C., & Jason, C. (2008). AutoPilot: A Platform-Based ESL Synthesis System. In P. Coussy, & A. Morawiec (Eds.), High-Level Synthesis: From Algorithm to Digital Circuit (pp. 99-112). Dordrecht, Netherlands: Springer. doi:10.1007/978-1-4020- 8588-8_6Zhuo, L., & Prasanna, V. K. (2005). High Performance Linear Algebra Operations on Reconfigurable Systems. In SC '05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, 12- 18 November (p. 2). Seattle, WA, USA: IEEE. doi:10.1109/SC.2005.31Vol. 25 Núm. 2 (2024): Revista Colombiana de Computación (Julio-Diciembre); 43-58Computación de Alto RendimientoGinkgoFPGAsSpMVHPCGinkgoFPGAsSpMVExpansión de Ginkgo para administrar kernels reconfigurables basados en hardwareExtending Ginkgo to Manage Reconfigurable Hardware-Based Kernelsinfo:eu-repo/semantics/articleArtículohttp://purl.org/coar/resource_type/c_2df8fbb1http://purl.org/redcol/resource_type/ARThttp://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/access_right/c_abf2ORIGINALArticulo 5.pdfArticulo 5.pdfArtículoapplication/pdf879859https://repository.unab.edu.co/bitstream/20.500.12749/28300/1/Articulo%205.pdf672c5de0081db2f9cdca0fcaf82999cdMD51open accessLICENSElicense.txtlicense.txttext/plain; charset=utf-8347https://repository.unab.edu.co/bitstream/20.500.12749/28300/2/license.txt855f7d18ea80f5df821f7004dff2f316MD52open accessTHUMBNAILArticulo 5.pdf.jpgArticulo 5.pdf.jpgIM Thumbnailimage/jpeg9727https://repository.unab.edu.co/bitstream/20.500.12749/28300/3/Articulo%205.pdf.jpg1968381a0286a1160700d26c6e3df0d3MD53open access20.500.12749/28300oai:repository.unab.edu.co:20.500.12749/283002025-02-14 22:00:30.937open accessRepositorio Institucional | Universidad Autónoma de Bucaramanga - UNABrepositorio@unab.edu.coTGEgUmV2aXN0YSBDb2xvbWJpYW5hIGRlIENvbXB1dGFjacOzbiBlcyBmaW5hbmNpYWRhIHBvciBsYSBVbml2ZXJzaWRhZCBBdXTDs25vbWEgZGUgQnVjYXJhbWFuZ2EuIEVzdGEgUmV2aXN0YSBubyBjb2JyYSB0YXNhIGRlIHN1bWlzacOzbiB5IHB1YmxpY2FjacOzbiBkZSBhcnTDrWN1bG9zLiBQcm92ZWUgYWNjZXNvIGxpYnJlIGlubWVkaWF0byBhIHN1IGNvbnRlbmlkbyBiYWpvIGVsIHByaW5jaXBpbyBkZSBxdWUgaGFjZXIgZGlzcG9uaWJsZSBncmF0dWl0YW1lbnRlIGludmVzdGlnYWNpw7NuIGFsIHDDumJsaWNvIGFwb3lhIGEgdW4gbWF5b3IgaW50ZXJjYW1iaW8gZGUgY29ub2NpbWllbnRvIGdsb2JhbC4=