Skip to main navigation Skip to search Skip to main content

Design and Implementation of 2D Convolution on x86/x64 Processors

  • Aristotle University of Thessaloniki

Research output: Contribution to journalArticlepeer-review

30 Downloads (Pure)

Abstract

In this paper, a new method for accelerating the 2D direct Convolution operation on x86/x64 processors is presented. It includes efficient vectorization by using SIMD intrinsics, bit-twiddling optimizations, the optimization of the division operation, multi-threading using OpenMP, register blocking and the shortest possible bit-width value of the intermediate results. The proposed method, which is provided as open-source, is general and can be applied to other processor families too, e.g., Arm. The proposed method has been evaluated on two different multi-core Intel CPUs, by using twenty different image sizes, 8-bit integer computations and the most commonly used kernel sizes (3x3, 5x5, 7x7, 9x9). It achieves from 2.8× to 40× speedup over the Intel IPP library (OpenCV GaussianBlur and Filter2D routines), from 105× to 400× speedup over the gemm-based convolution method (by using Intel MKL int8 matrix multiplication routine), and from 8.5× to 618× speedup over the vslsConvExec Intel MKL direct convolution routine. The proposed method is superior as it achieves far fewer arithmetical and load/store instructions.
Original languageEnglish
Pages (from-to)3800 - 3815
JournalIEEE Transactions on Parallel and Distributed Systems
Volume14
Issue number8
Early online date29 Apr 2022
DOIs
Publication statusPublished - 1 Dec 2022

Fingerprint

Dive into the research topics of 'Design and Implementation of 2D Convolution on x86/x64 Processors'. Together they form a unique fingerprint.

Cite this