Abstract
Every individual's perception of multimedia content varies based on their interpretation. Therefore, it is quite challenging to predict likability of any multimedia just based on its content. This paper presents a novel system for analysis of facial expressions of subject against the multimedia content to be evaluated. First, we developed a dataset by recording facial expressions of subjects under uncontrolled environment. These subjects are volunteers recruited to watch the videos of different genre, and provide their feedback in terms of likability. Subject responses are divided into three categories: Like, Neutral and Dislike. A novel multimodal system is developed using the developed dataset. The model learns feature representation from data based on the three provided categories. The proposed system contains ensemble of time distributed convolutional neural network, 3D convolutional neural network, and long short term memory networks. All the modalities in proposed architecture are evaluated independently as well as in distinct combinations. The paper also provides detailed insight into learning behavior of the proposed system.
Original language | English |
---|---|
Article number | 9504548 |
Pages (from-to) | 110421-110434 |
Number of pages | 14 |
Journal | IEEE Access |
Volume | 9 |
DOIs | |
Publication status | Published - 2021 |
Externally published | Yes |
ASJC Scopus subject areas
- General Computer Science
- General Materials Science
- General Engineering
Keywords
- Affective computing
- deep neural architecture
- facial expression analysis
- multimedia evaluation system
- representation learning