Abstract
Low-Rank Factorization (LRF) is a popular compression technique used in Deep Neural Networks (DNNs). LRF can reduce both the memory size and the arithmetic operations in a DNN layer by approximating a weight tensor/matrix by two or more smaller tensors/matrices. Employing LRF to DNN is a challenging task for several reasons. First, the exploration space is massive and different solutions provide different trade-offs among memory, FLOPs, inference time, and validation accuracy; second, multiple DNN layers and multiple LRF algorithms must be considered; third, every extracted solution must undergo through a calibration phase and this makes the LRF process time-consuming. In this paper, a methodology, called Denseflex, is presented that formulates the LRF problem as an inference time vs. FLOPs vs. memory vs. validation accuracy Design Space Exploration (DSE) problem. Moreover, to the best of our knowledge, this is the first work that proposes a methodology to efficiently combine two different LRF methods (Singular Value Decomposition -SVD- and Tensor Train Decomposition -TTD-) in the same framework. Denseflex is formulated as a design tool in which the user can provide specific memory, FLOPs, and/or execution time constraints and the tool will output a set of solutions that meet the given constraints avoiding the time-consuming re-training phases. Our results indicate that our approach is able to prune the design space by 62% (on average) over related works for nine DNN models (up to 88% in AlexNet), while the extracted LRF solutions exhibit both lower memory footprints and lower execution times compared to the initial model.
Original language | English |
---|---|
Pages | 21-31 |
Number of pages | 11 |
DOIs | |
Publication status | Published - 7 May 2024 |
Event | 21st ACM International Conference on Computing Frontiers - Ischia, Italy Duration: 7 May 2024 → 9 May 2024 https://www.computingfrontiers.org/2024/program.html |
Conference
Conference | 21st ACM International Conference on Computing Frontiers |
---|---|
Abbreviated title | CF' 24 |
Country/Territory | Italy |
City | Ischia |
Period | 7/05/24 → 9/05/24 |
Internet address |
ASJC Scopus subject areas
- General Computer Science
Keywords
- Deep neural networks
- Compression
- Singular Value Decomposition
- Tensor Train Decomposition
- Design space exploration
- Design Space Exploration
- Deep Neural Networks