Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention

Weiyan Shi; Hai Viet Le; Kenny Tsu Wei Choo; ACM

doi:10.1145/3706599.3720215

Back

Conference proceeding

Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention

Weiyan Shi, Hai Viet Le, Kenny Tsu Wei Choo and ACM

Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, pp.1-6

ACM Conferences

CHI EA '25: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems

26/04/2025

DOI: https://doi.org/10.1145/3706599.3720215

Abstract

Computing methodologies -- Computer vision

Computing methodologies -- Natural language processing

Human-centered computing -- Empirical studies in collaborative and social computing

Joint attention is a critical component of early speech-language development and a key indicator of effective parent-child interaction. However, research on detecting and analysing joint attention remains limited, particularly for Multimodal Large Language Models (MLLMs). This study evaluates MLLMs’ ability to comprehend joint attention by analysing 26 parent-child interaction videos annotated by two speech-language pathologists. These annotations identify strong and poor joint attention segments, serving as benchmarks for evaluating the models’ interpretive capabilities. Our findings reveal that current MLLMs struggle to accurately interpret joint attention due to a lack of nuanced understanding of child-initiated eye contact, a crucial component of joint attention dynamics. This study highlights the importance of incorporating detailed eye contact to enhance MLLMs’ multimodal reasoning. Addressing these gaps is essential for future research to advance the use of MLLMs in analysing and supporting parent-child interactions.

Files and links (1)

url

https://doi.org/10.1145/3706599.3720215View

Published (Version of record) Open

Metrics

1 Record Views

Details

Title: Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention
Creators - without role: Weiyan Shi - Singapore University of Technology and Design
Hai Viet Le - ,
Kenny Tsu Wei Choo - Singapore University of Technology and Design
ACM
Contributors - without role: Naomi Yamashita - Kyoto University
Vanessa Evers - Nanyang Technological University
Koji Yatani - The University of Tokyo
Xianghua (Sharon) Ding - University of Glasgow
Publication Details: Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, pp.1-6
Conference: CHI EA '25: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems
Series: ACM Conferences
Publisher: ACM
Number of pages: 6
Identifiers: 9912407609846
Academic Unit: ISTD Pillar
Language: English
Resource Type: Conference proceeding

Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention

Abstract

Files and links (1)

Metrics

Details

Singapore University of Technology and Design Social media