Design of Quantized Deep Neural Network Hardware Inference Accelerator Using Systolic Architecture

Dary Mochamad Rifqie; Yasser Abd. Djawad; Faizal Arya Samman; Ansari Saleh Ahmar; M. Miftach Fakhri

doi:10.35877/454RI.asci2689

Authors

Dary Mochamad Rifqie Department of Electronics Engineering, Universitas Negeri Makasssar, Makassar, 90223, Indonesia
Yasser Abd. Djawad Department of Electronics Engineering, Universitas Negeri Makasssar, Makassar, 90223, Indonesia
Faizal Arya Samman Department of Electrical Engineering, Universitas Hasanuddin, Makassar, 90245, Indonesia
Ansari Saleh Ahmar Department of Statistics, Universitas Negeri Makassar, Makassar, 90223, Indonesia
M. Miftach Fakhri Department of Informatics and Computer Engineering Education, Universitas Negeri Makasssar, Makassar, 90223, Indonesia

DOI:

https://doi.org/10.35877/454RI.asci2689

Keywords:

Deep Neural Network, Hardware Accelerator, Systolic Architecture, Quantization, Verilog

Abstract

This paper presents a hardware inference accelerator architecture of quantized deep neural networks (DNN). The proposed accelerator implements all computation in a quantize version of DNN including linear transformations like matrix multiplications, nonlinear activation functions such as ReLU, quantization and dequantization operation. The hardware accelerator of quantized DNN consists of matrix multiplication core which is implemented in systolic array architecture, and the QDR core for computing the operation of quantization, dequantization, and ReLU. This proposed hardware architecture is implemented in Verilog Hardware Description Language (HDL) code using modelsim. To validate, we simulated the quantized DNN using Python programming language and compared the results with our proposed hardware accelerator. The result of this comparison shows a very slight difference, confirming the validity of our quantized DNN hardware accelerator.

Downloads

Download data is not yet available.

References

Adiono, T., Meliolla, G., Setiawan, E., & Harimurti, S. (2019). Design of Neural Network Architecture using Systolic Array Implemented in Verilog Code. ISESD 2018 - International Symposium on Electronics and Smart Devices: Smart Devices for Big Data Analytic and Machine Learning, 126, 1–4. https://doi.org/10.1109/ISESD.2018.8605478

Adiono, T., Putra, A., Sutisna, N., Syafalni, I., & Mulyawan, R. (2021). Low Latency YOLOv3-Tiny Accelerator for Low-Cost FPGA Using General Matrix Multiplication Principle. IEEE Access, 9, 141890–141913. https://doi.org/10.1109/ACCESS.2021.3120629

Amin, M., & Adiono, T. (2019). Area Optimized CNN Architecture Using Folding Approach. 2019 International Conference on Electrical Engineering and Informatics (ICEEI), 206–209. https://doi.org/10.1109/ICEEI47359.2019.8988879

Baba, A. (2024). Neural networks from biological to artificial and vice versa. Biosystems, 235, 105110. https://doi.org/https://doi.org/10.1016/j.biosystems.2023.105110

Bishop, C. M., & Bishop, H. (2024). Foundations and Concepts Deep Learning.

Courbariaux, M., David, J. P., & Bengio, Y. (2015). Training deep neural networks with low precision multiplications. 3rd International Conference on Learning Representations, ICLR 2015 - Workshop Track Proceedings, Section 5, 1–10.

El Omary, S., Lahrache, S., & El Ouazzani, R. (2024). Attention mechanism-based model for cardiomegaly recognition in chest X-Ray images. IAES International Journal of Artificial Intelligence (IJ-AI), 13(1), 1005. https://doi.org/10.11591/ijai.v13.i1.pp1005-1013

Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M. W., & Keutzer, K. (2021). A Survey of Quantization Methods for Efficient Neural Network Inference. CoRR, abs/2103.1. https://arxiv.org/abs/2103.13630

Holly, S., Wendt, A., & Lechner, M. (2020). Profiling Energy Consumption of Deep Neural Networks on NVIDIA Jetson Nano. 2020 11th International Green and Sustainable Computing Workshops, IGSC 2020. https://doi.org/10.1109/IGSC51522.2020.9290876

Jones, D. H., Powell, A., Bouganis, C. S., & Cheung, P. Y. K. (2010). GPU versus FPGA for high productivity computing. Proceedings - 2010 International Conference on Field Programmable Logic and Applications, FPL 2010, March 2014, 119–124. https://doi.org/10.1109/FPL.2010.32

Leini, Z., & Xiaolei, S. (2021). Study on Speech Recognition Method of Artificial Intelligence Deep Learning. Journal of Physics: Conference Series, 1754(1), 12183. https://doi.org/10.1088/1742-6596/1754/1/012183

Liu, C. T., Wu, Y. H., Lin, Y. S., & Chien, S. Y. (2018). Computation-Performance Optimization of Convolutional Neural Networks with Redundant Kernel Removal. Proceedings - IEEE International Symposium on Circuits and Systems, 2018-May. https://doi.org/10.1109/ISCAS.2018.8351053

Miao, H., & Lin, F. X. (2021). Enabling Large Neural Networks on Tiny Microcontrollers with Swapping. http://arxiv.org/abs/2101.08744

Paudyal, R., Shah, A. D., Akin, O., Do, R. K. G., Konar, A. S., Hatzoglou, V., Mahmood, U., Lee, N., Wong, R. J., Banerjee, S., Shin, J., Veeraraghavan, H., & Shukla-Dave, A. (2023). Artificial Intelligence in CT and MR Imaging for Oncological Applications. Cancers, 15(9), 1–22. https://doi.org/10.3390/cancers15092573

Qasaimeh, M., Denolf, K., Lo, J., Vissers, K., Zambreno, J., & Jones, P. H. (2019). Comparing energy Efficiency of CPU, GPU and FPGA implementations for vision kernels. 2019 IEEE International Conference on Embedded Software and Systems, ICESS 2019. https://doi.org/10.1109/ICESS.2019.8782524

Rifqie, D. M., Surianto, D. F., Abdal, N. M., Hidayat M, W., & Ramli, H. (2022). POST TRAINING QUANTIZATION IN LENET-5 ALGORITHM FOR EFFICIENT INFERENCE. Journal of Embedded Systems, Security and Intelligent Systems, 3(1), 60. https://doi.org/10.26858/jessi.v3i1.34106

Tofik, A., & Pratim, R. P. (2024). Enhancing Small Object Encoding in Deep Neural Networks: Introducing Fast&Focused-Net with Volume-wise Dot Product Layer. 1–8. http://arxiv.org/abs/2401.09823

Vakalopoulou, M., Christodoulidis, S., Burgos, N., Colliot, O., Vakalopoulou, M., Christodoulidis, S., Burgos, N., Colliot, O., & Deep, V. L. (2023). Deep learning : basics and convolutional neural networks To cite this version : Chapter 3 Deep learning : basics and convolutional neural networks ( CNN ). https://doi.org/10.1007/978-1-0716-3195-9