Logo image
Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention
Conference proceeding

Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention

Weiyan Shi, Hai Viet Le, Kenny Tsu Wei Choo and ACM
Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, pp.1-6
ACM Conferences
CHI EA '25: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems
26/04/2025

Abstract

Computing methodologies -- Computer vision Computing methodologies -- Natural language processing Human-centered computing -- Empirical studies in collaborative and social computing
Joint attention is a critical component of early speech-language development and a key indicator of effective parent-child interaction. However, research on detecting and analysing joint attention remains limited, particularly for Multimodal Large Language Models (MLLMs). This study evaluates MLLMs’ ability to comprehend joint attention by analysing 26 parent-child interaction videos annotated by two speech-language pathologists. These annotations identify strong and poor joint attention segments, serving as benchmarks for evaluating the models’ interpretive capabilities. Our findings reveal that current MLLMs struggle to accurately interpret joint attention due to a lack of nuanced understanding of child-initiated eye contact, a crucial component of joint attention dynamics. This study highlights the importance of incorporating detailed eye contact to enhance MLLMs’ multimodal reasoning. Addressing these gaps is essential for future research to advance the use of MLLMs in analysing and supporting parent-child interactions.
url
https://doi.org/10.1145/3706599.3720215View
Published (Version of record) Open

Metrics

1 Record Views

Details

Logo image