Grounding Word Meaning through Perception: Toward Compositional Language Understanding in Human-Robot Interaction

Research output: Contribution to conferenceConference paper (not formally published)peer-review

Abstract

For autonomous robots to interact naturally with humans, they must develop language understanding capabilities that connect linguistic expressions to multimodal perception. A key challenge arises when robots encounter lexical variations such as synonyms or novel phrases not observed during training. In this ongoing work, we present a multimodal word grounding framework that systematically integrates linguistic structures—including word indices, parts-of-speech tags, semantic word embeddings, and large language model representations—with perceptual features extracted from sensory data, including object geometry, color, and spatial positioning (centroids), where spatial relationships are learned through our Bayesian grounding model. We evaluate five experimental cases and demonstrate improved synonym generalization using semantic embeddings. While this framework effectively grounds individual words, it is limited to single-word grounding and cannot handle more complex linguistic structures such as phrases or full sentences. Therefore, we discuss extending the framework toward compositional language understanding, from the word to phrase to sentence levels, aiming to enable robots to build linguistic knowledge in an unsupervised bottom-up manner. This work contributes to advancing robot language understanding and generalization for natural human–robot interaction in dynamic environments.
Original languageEnglish
Number of pages4
Publication statusPublished - 30 Jun 2025
EventIEEE International Conference on Robot & Human Interactive Communication (Ro-Man) - Netherlands, Eindhoven
Duration: 25 Aug 202529 Aug 2025
https://www.ro-man2025.org/

Conference

ConferenceIEEE International Conference on Robot & Human Interactive Communication (Ro-Man)
CityEindhoven
Period25/08/2529/08/25
Internet address

Fingerprint

Dive into the research topics of 'Grounding Word Meaning through Perception: Toward Compositional Language Understanding in Human-Robot Interaction'. Together they form a unique fingerprint.

Cite this