Since September 2018, I have held the position of a Lecturer (Assistant Professor) at the University of Plymouth within the Department of Computing. My research area, in simple terms, lies in the broad field of optimizing software applications, e.g., Deep Neural Network (DNN) applications, in terms of execution time, energy consumption, and memory footprint; my research includes a diverse spectrum of areas, including low-level hardware-dependent compiler optimizations/ techniques for CPUs, GPUs, and FPGAs, high-level compression-based optimizations for DNNs, such as filter pruning, low-rank factorization and quantization, task scheduling, and memory management strategies.
In 2013, I received my PhD from the department of Electrical and Computer Engineering at University of Patras; I composed and won the Greek PhD Research Scholarship. From September 2013 until December 2016 I had been working as a postdoctoral researcher at VLSI lab at Dept. of Electrical and Computer Engineering, University of Patras. Additionally, from October 2015 until December 2016, I had been working as a postdoctoral researcher at Embedded System Design and Application Lab of Technological Educational Institute of Western Greece. From Jan. 2017 until Dec. 2017 I was a Research Fellow at Distributed Systems and Services Research Group, School of Computing, University of Leeds (UK). Last, from Dec. 2017 until Sept. 2018 I had been working as a Lecturer at Sheffield Hallam University.
Research Interests:
- Compiler Optimizations for High Performance Computing
- Optimizing Deep Neural Networks in terms of execution time and memory size
- Optimization of Matrix/Tensor Computations
- Loop Transformations
- Compression Techniques for Deep Neural Networks
- Data Movement Optimization in Cache Memories
- Task Scheduling for High Performance Computing
I have strong R&D experience in optimizing software applications, in terms of execution time, energy consumption and memory size, on a wide range of hardware platforms including embedded systems, GPUs and FPGAs.
I have published more than 50 research papers in high quality journals and conferences, such as IEEE/ACM transactions. I am currently supervising three PhD students.
Research Highlights:
Compiler Optimizations for accelerating Deep Neural Networks [1]: Convolution layers are the main performance bottleneck in many classes of Deep Neural Networks and especially in Convolutional Neural Networks which are widely used in artificial intelligence applications such as computer vision. In this research work [1], a novel analytical methodology is developed for super-fast convolution layers on CPUs. The experimental results, which include 112 different convolution layers and two hardware platforms, show that the convolution layers of ResNet-50, DenseNet-121 and SqueezeNet are executed from x1.1 up to x7.2 times faster than Intel oneDNN state of the art library.
[1] V. Kelefouras and G. Keramidas, “Design and Implementation of Deep Learning 2D Convolutions on modern CPUs,” IEEE Transactions on Parallel and Distributed Systems, 2023
Compiler Optimizations for accelerating Smoothing, Sharpening and Edge Detection, image/video processing software applications [2]: A novel methodology is developed [2] to speedup Smoothing, Sharpening and Edge Detection algorithms on CPUs. To accelerate such routines, the popular OpenCV library supports Intel IPP optimized library to accelerate its routines on Intel CPUs (not by default - this needs to be specified during the installation phase). Based on our experimental results, which include 20 different image sizes and two hardware platforms, the proposed methodology achieves from x2.8 to x40 speedup over the Intel IPP / OpenCV library for GaussianBlur() and Filter2D() OpenCV routines.
[2] V. Kelefouras and G. Keramidas, “Design and implementation of 2D convolution on x86/x64 processors,” IEEE Transactions on Parallel and Distributed Systems, 2022
Awards:
- Won the HiPEAC technology transfer award https://www.hipeac.net/awards/#/tech-transfer/2022/ for the proposal 'DSE methodology for Tensor Train Decomposition in NEOX AI-SDK'. The technology has been transferred to Think Silicon, a provider of ultra-low-power graphics processing units and Machine Learning accelerators.
- Won the Best Paper Award at SAMOS XXII Conference: 22nd International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation. The paper is entitled “A Design Space Exploration Methodology for Enabling Tensor Train Decomposition in Edge Devices”
Currently, I am the module leader of the following modules
- Parallel Computing (COMP3001)
- Computer Systems (COMP1001)
- Computing Practice (COMP1004)
Admin Roles
- Undergraduate Admissions Tutor (International articulations and open days manager)
- Academic Liaison Person for Partner Colleges
- Equality, diversity and inclusion (EDI) committee member