Abstract
This paper identifies the basis of auditory similarity in concatenative sound synthesis users. Concatenative sound synthesis (CSS) system is an existing approach to create new sounds based on a user supplied audio query. Typically, the audio is synthesised based on the least distance between the query sound unit and the available sound units in the database. However, sounds synthesised through this approach often times result in a mediocre level of satisfaction within the users as confusion between various audio perception attributes during the CSS system's matching process causes mismatches to occur. This study aims to determine the dominant perceptual attribute that humans base their judgment of sound similarity on. The study also looks at two categories of CSS system's users: Musicians and non-musicians, and observes whether there is a significant difference in the subjective judgments between the two groups with regards to sound similarity. Thirty-eight participants were subjected to the listening test, where six pairwise comparisons from four different audio perceptual attributes (melody, timbral, tempo and loudness) were compared. In general, it was found that the majority of users in the Musicians group (73.3%) based their sound similarity on timbre attribute, whilst the majority of the users in the Non-musicians group (78.3%) based their sound similarity on the melody attribute. This information may be used to help CSS system cater to the expectations of its users and generate the sounds with the closest matching audio perceptual attribute accordingly.
Original language | English |
---|---|
Pages (from-to) | 18039-18045 |
Number of pages | 0 |
Journal | ARPN Journal of Engineering and Applied Sciences |
Volume | 10 |
Issue number | 23 |
Publication status | Published - 1 Jan 2015 |