Denseflex: A Low Rank Factorization Methodology for Adaptable Dense Layers in DNNs

Milad Kokhazadeh, Georgios Keramidas, Vasilios Kelefouras, Iakovos Stamoulis

Research output: Contribution to conferenceConference paper (not formally published)peer-review

Abstract

Low-Rank Factorization (LRF) is a popular compression technique used in Deep Neural Networks (DNNs). LRF can reduce both the memory size and the arithmetic operations in a DNN layer by approximating a weight tensor/matrix by two or more smaller tensors/matrices. Employing LRF to DNN is a challenging task for several reasons. First, the exploration space is massive and different solutions provide different trade-offs among memory, FLOPs, inference time, and validation accuracy; second, multiple DNN layers and multiple LRF algorithms must be considered; third, every extracted solution must undergo through a calibration phase and this makes the LRF process time-consuming. In this paper, a methodology, called Denseflex, is presented that formulates the LRF problem as an inference time vs. FLOPs vs. memory vs. validation accuracy Design Space Exploration (DSE) problem. Moreover, to the best of our knowledge, this is the first work that proposes a methodology to efficiently combine two different LRF methods (Singular Value Decomposition -SVD- and Tensor Train Decomposition -TTD-) in the same framework. Denseflex is formulated as a design tool in which the user can provide specific memory, FLOPs, and/or execution time constraints and the tool will output a set of solutions that meet the given constraints avoiding the time-consuming re-training phases. Our results indicate that our approach is able to prune the design space by 62% (on average) over related works for nine DNN models (up to 88% in AlexNet), while the extracted LRF solutions exhibit both lower memory footprints and lower execution times compared to the initial model.

Original languageEnglish
Pages21-31
Number of pages11
DOIs
Publication statusPublished - 7 May 2024
Event21st ACM International Conference on Computing Frontiers - Ischia, Italy
Duration: 7 May 20249 May 2024
https://www.computingfrontiers.org/2024/program.html

Conference

Conference21st ACM International Conference on Computing Frontiers
Abbreviated titleCF' 24
Country/TerritoryItaly
CityIschia
Period7/05/249/05/24
Internet address

ASJC Scopus subject areas

  • General Computer Science

Keywords

  • Deep neural networks
  • Compression
  • Singular Value Decomposition
  • Tensor Train Decomposition
  • Design space exploration
  • Design Space Exploration
  • Deep Neural Networks

Fingerprint

Dive into the research topics of 'Denseflex: A Low Rank Factorization Methodology for Adaptable Dense Layers in DNNs'. Together they form a unique fingerprint.

Cite this