Design of Quantized Deep Neural Network Hardware Inference Accelerator Using Systolic Architecture
Viewed = 0 time(s)
Abstract
This paper presents a hardware inference accelerator architecture of quantized deep neural networks (DNN). The proposed accelerator implements all computation in a quantize version of DNN including linear transformations like matrix multiplications, nonlinear activation functions such as ReLU, quantization and dequantization operation. The hardware accelerator of quantized DNN consists of matrix multiplication core which is implemented in systolic array architecture, and the QDR core for computing the operation of quantization, dequantization, and ReLU. This proposed hardware architecture is implemented in Verilog Hardware Description Language (HDL) code using modelsim. To validate, we simulated the quantized DNN using Python programming language and compared the results with our proposed hardware accelerator. The result of this comparison shows a very slight difference, confirming the validity of our quantized DNN hardware accelerator.
Downloads
References
Adiono, T., Meliolla, G., Setiawan, E., & Harimurti, S. (2019). Design of Neural Network Architecture using Systolic Array Implemented in Verilog Code. ISESD 2018 - International Symposium on Electronics and Smart Devices: Smart Devices for Big Data Analytic and Machine Learning, 126, 1–4. https://doi.org/10.1109/ISESD.2018.8605478
Adiono, T., Putra, A., Sutisna, N., Syafalni, I., & Mulyawan, R. (2021). Low Latency YOLOv3-Tiny Accelerator for Low-Cost FPGA Using General Matrix Multiplication Principle. IEEE Access, 9, 141890–141913. https://doi.org/10.1109/ACCESS.2021.3120629
Amin, M., & Adiono, T. (2019). Area Optimized CNN Architecture Using Folding Approach. 2019 International Conference on Electrical Engineering and Informatics (ICEEI), 206–209. https://doi.org/10.1109/ICEEI47359.2019.8988879
Baba, A. (2024). Neural networks from biological to artificial and vice versa. Biosystems, 235, 105110. https://doi.org/https://doi.org/10.1016/j.biosystems.2023.105110
Bishop, C. M., & Bishop, H. (2024). Foundations and Concepts Deep Learning.
Courbariaux, M., David, J. P., & Bengio, Y. (2015). Training deep neural networks with low precision multiplications. 3rd International Conference on Learning Representations, ICLR 2015 - Workshop Track Proceedings, Section 5, 1–10.
El Omary, S., Lahrache, S., & El Ouazzani, R. (2024). Attention mechanism-based model for cardiomegaly recognition in chest X-Ray images. IAES International Journal of Artificial Intelligence (IJ-AI), 13(1), 1005. https://doi.org/10.11591/ijai.v13.i1.pp1005-1013
Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M. W., & Keutzer, K. (2021). A Survey of Quantization Methods for Efficient Neural Network Inference. CoRR, abs/2103.1. https://arxiv.org/abs/2103.13630
Holly, S., Wendt, A., & Lechner, M. (2020). Profiling Energy Consumption of Deep Neural Networks on NVIDIA Jetson Nano. 2020 11th International Green and Sustainable Computing Workshops, IGSC 2020. https://doi.org/10.1109/IGSC51522.2020.9290876
Jones, D. H., Powell, A., Bouganis, C. S., & Cheung, P. Y. K. (2010). GPU versus FPGA for high productivity computing. Proceedings - 2010 International Conference on Field Programmable Logic and Applications, FPL 2010, March 2014, 119–124. https://doi.org/10.1109/FPL.2010.32
Leini, Z., & Xiaolei, S. (2021). Study on Speech Recognition Method of Artificial Intelligence Deep Learning. Journal of Physics: Conference Series, 1754(1), 12183. https://doi.org/10.1088/1742-6596/1754/1/012183
Liu, C. T., Wu, Y. H., Lin, Y. S., & Chien, S. Y. (2018). Computation-Performance Optimization of Convolutional Neural Networks with Redundant Kernel Removal. Proceedings - IEEE International Symposium on Circuits and Systems, 2018-May. https://doi.org/10.1109/ISCAS.2018.8351053
Miao, H., & Lin, F. X. (2021). Enabling Large Neural Networks on Tiny Microcontrollers with Swapping. http://arxiv.org/abs/2101.08744
Paudyal, R., Shah, A. D., Akin, O., Do, R. K. G., Konar, A. S., Hatzoglou, V., Mahmood, U., Lee, N., Wong, R. J., Banerjee, S., Shin, J., Veeraraghavan, H., & Shukla-Dave, A. (2023). Artificial Intelligence in CT and MR Imaging for Oncological Applications. Cancers, 15(9), 1–22. https://doi.org/10.3390/cancers15092573
Qasaimeh, M., Denolf, K., Lo, J., Vissers, K., Zambreno, J., & Jones, P. H. (2019). Comparing energy Efficiency of CPU, GPU and FPGA implementations for vision kernels. 2019 IEEE International Conference on Embedded Software and Systems, ICESS 2019. https://doi.org/10.1109/ICESS.2019.8782524
Rifqie, D. M., Surianto, D. F., Abdal, N. M., Hidayat M, W., & Ramli, H. (2022). POST TRAINING QUANTIZATION IN LENET-5 ALGORITHM FOR EFFICIENT INFERENCE. Journal of Embedded Systems, Security and Intelligent Systems, 3(1), 60. https://doi.org/10.26858/jessi.v3i1.34106
Tofik, A., & Pratim, R. P. (2024). Enhancing Small Object Encoding in Deep Neural Networks: Introducing Fast&Focused-Net with Volume-wise Dot Product Layer. 1–8. http://arxiv.org/abs/2401.09823
Vakalopoulou, M., Christodoulidis, S., Burgos, N., Colliot, O., Vakalopoulou, M., Christodoulidis, S., Burgos, N., Colliot, O., & Deep, V. L. (2023). Deep learning : basics and convolutional neural networks To cite this version : Chapter 3 Deep learning : basics and convolutional neural networks ( CNN ). https://doi.org/10.1007/978-1-0716-3195-9
Copyright (c) 2024 Dary Mochamad Rifqie, Yasser Abd Djawad, Faizal Arya Samman, Ansari Saleh Ahmar, M. Miftach Fakhri (Author)
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.