Output list
Conference proceeding
Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention
Published 26/04/2025
Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 1 - 6
CHI EA '25: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems
Joint attention is a critical component of early speech-language development and a key indicator of effective parent-child interaction. However, research on detecting and analysing joint attention remains limited, particularly for Multimodal Large Language Models (MLLMs). This study evaluates MLLMs’ ability to comprehend joint attention by analysing 26 parent-child interaction videos annotated by two speech-language pathologists. These annotations identify strong and poor joint attention segments, serving as benchmarks for evaluating the models’ interpretive capabilities. Our findings reveal that current MLLMs struggle to accurately interpret joint attention due to a lack of nuanced understanding of child-initiated eye contact, a crucial component of joint attention dynamics. This study highlights the importance of incorporating detailed eye contact to enhance MLLMs’ multimodal reasoning. Addressing these gaps is essential for future research to advance the use of MLLMs in analysing and supporting parent-child interactions.
Conference proceeding
InfoTrace: A System for Information Campaign Source Tracing and Analysis on Social Media
Published 16/12/2024
Proceedings of the 24th ACM/IEEE Joint Conference on Digital Libraries, 1 - 3
JCDL '24: 24th ACM/IEEE Joint Conference on Digital Libraries
Social media platforms have become integral to daily life and serve as powerful channels for influencing public opinion, spanning applications from viral marketing to disinformation campaigns. While these platforms amplify the reach of marketing efforts, they also present significant risks when used for spreading misinformation via campaigns. Thus, it is crucial to understand such information campaigns by identifying the source, the different discussion topics within this campaign and how they change over time. Towards addressing this problem, we develop and present an interactive system for information campaign source tracing and analysis. This includes a demonstration and visualization of the main components of a information campaign, including the source of the campaign via explicit and implicit links, the discussion topics/clusters, their content, and how these evolve over different time periods.
Conference proceeding
Towards Understanding Emotions for Engaged Mental Health Conversations
Published 01/07/2024
Companion Publication of the 2024 ACM Designing Interactive Systems Conference, 176 - 180
DIS '24: Designing Interactive Systems Conference
Providing timely support and intervention is crucial in mental health settings. As the need to engage youth comfortable with texting increases, mental health providers are exploring and adopting text-based media such as chatbots, community-based forums, online therapies with licensed professionals, and helplines operated by trained responders. To support these text-based media for mental health–particularly for crisis care–we are developing a system to perform passive emotion-sensing using a combination of keystroke dynamics and sentiment analysis. Our early studies of this system posit that the analysis of short text messages and keyboard typing patterns can provide emotion information that may be used to support both clients and responders. We use our preliminary findings to discuss the way forward for applying AI to support mental health providers in providing better care.
Conference proceeding
Analyzing Swimming Performance Using Drone Captured Aerial Videos
Published 03/06/2024
Proceedings of the 10th Workshop on Micro Aerial Vehicle Networks, Systems, and Applications, 7 - 12
MOBISYS '24: The 22nd Annual International Conference on Mobile Systems, Applications and Services
Monitoring swimmer performance is crucial for improving training and enhancing athletic techniques. Traditional methods for tracking swimmers, such as above-water and underwater cameras, face limitations due to the need for multiple cameras and obstructions from water splashes. This paper presents a novel approach for tracking swimmers using a moving UAV. The proposed system employs a UAV equipped with a high-resolution camera to capture aerial footage of the swimmers. The footage is then processed using computer vision algorithms to extract the swimmers' positions and movements. This approach offers several advantages, including single camera use and comprehensive coverage. The system's accuracy is evaluated with both training and in competition videos. The results demonstrate the system's ability to accurately track swimmers' movements, limb angles, stroke duration and velocity with the maximum error of 0.3 seconds and 0.35 m/s for stroke duration and velocity, respectively.
Conference proceeding
A Taxonomy for Human-LLM Interaction Modes: An Initial Exploration
Published 11/05/2024
Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 1 - 11
With ChatGPT’s release, conversational prompting has become the most popular form of human-LLM interaction. However, its effectiveness is limited for more complex tasks involving reasoning, creativity, and iteration. Through a systematic analysis of HCI papers published since 2021, we identified four key phases in the human-LLM interaction flow—planning, facilitating, iterating, and testing—to precisely understand the dynamics of this process. Additionally, we have developed a taxonomy of four primary interaction modes: Mode 1: Standard Prompting, Mode 2: User Interface, Mode 3: Context-based, and Mode 4: Agent Facilitator. This taxonomy was further enriched using the “5W1H” guideline method, which involved a detailed examination of definitions, participant roles (Who), the phases that happened (When), human objectives and LLM abilities (What), and the mechanics of each interaction mode (How). We anticipate this taxonomy will contribute to the future design and evaluation of human-LLM interaction.
Conference proceeding
Published 01/01/2024
2024 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2024, 6012 - 6025
Detecting hate speech and offensive language is essential for maintaining a safe and respectful digital environment. This study examines the limitations of state-of-the-art large language models (LLMs) in identifying offensive content within systematically perturbed data, with a focus on Chinese, a language particularly susceptible to such perturbations. We introduce ToxiCloakCN1, an enhanced dataset derived from ToxiCN, augmented with homophonic substitutions and emoji transformations, to test the robustness of LLMs against these cloaking perturbations. Our findings reveal that existing models significantly underperform in detecting offensive content when these perturbations are applied. We provide an in-depth analysis of how different types of offensive content are affected by these perturbations and explore the alignment between human and model explanations of offensiveness. Our work highlights the urgent need for more advanced techniques in offensive language detection to combat the evolving tactics used to evade detection mechanisms.
Journal article
Published 29/11/2023
ACM transactions on computer-human interaction, 31, 1, 1 - 38
While AI-assisted individual qualitative analysis has been substantially studied, AI-assisted collaborative qualitative analysis (CQA) – a process that involves multiple researchers working together to interpret data—remains relatively unexplored. After identifying CQA practices and design opportunities through formative interviews, we designed and implemented CoAIcoder, a tool leveraging AI to enhance human-to-human collaboration within CQA through four distinct collaboration methods. With a between-subject design, we evaluated CoAIcoder with 32 pairs of CQA-trained participants across common CQA phases under each collaboration method. Our findings suggest that while using a shared AI model as a mediator among coders could improve CQA efficiency and foster agreement more quickly in the early coding stage, it might affect the final code diversity. We also emphasize the need to consider the independence level when using AI to assist human-to-human collaboration in various CQA scenarios. Lastly, we suggest design implications for future AI-assisted CQA systems.
Conference proceeding
Assessing Programming Skills and Knowledge During the COVID-19 Pandemic: An Experience Report
Published 01/01/2021
Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education V. 1, 352 - 358
The current COVID-19 pandemic has resulted in disruption to the delivery of higher education. The government-mandated workplace closures that lasted for two months from April 2020 resulted in the closing of all university campuses in our city. This happened while our first-year introductory Python programming course was still in progress. We were thus unable to administer our final exam on campus. In this paper, we describe how our final exam, usually conducted on campus, was replaced with a performance-based assessment. This assessment tasked students to design and program their own game individually. After submitting their code, each student was then required to attend an oral exam that was administered online. We reflect on our experience, drawing from both instructors' and students' perspectives of the programming task and the assessment format. We conclude with a description of how the lessons learnt were applied to a subsequent run of the course.
Conference proceeding
User Perceptions and Adoption of Plug Load Management Systems in the Workplace
Published 01/01/2021
Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, 1 - 6
Smart energy management systems incorporate advanced sensing and control technologies that enable users to monitor and reduce their energy consumption through interactive visualisations and automated control features. Plug load management systems (PLMS), in particular, are applications of such systems targeting electrical devices found in homes and workplaces. While past studies have mostly focused on PLMS adoption in homes, past literature indicates several key differences in user motivation as office users typically do not bear the cost of their consumption. This reduces their motivation to embrace such systems, resulting in low adoption rates. In our research, we examined user perception of adopting PLMS in the workplace through a series of focus group discussions and an online survey guided by findings from the focus group discussions. By analysing the quantitative and qualitative responses from 101 participants, we identified six design implications to guide the development of future PLMS in the workplace.
Conference proceeding
Pose Estimation for Facilitating Movement Learning from Online Videos
Published 01/01/2020
Proceedings of the International Conference on Advanced Visual Interfaces, 1 - 5
There exists a multitude of online video tutorials to teach physical movements such as exercises. Yet, users lack support to verify the accuracy of their movements when following such videos and have to rely on their own perception. To address this, we developed a web-based application that performs human pose estimation using both video inputs from the online video and web camera, then provides different types of visual feedback to a user. Our study suggests that a user's skeleton overlaid on the user's camera feed improves user performance, whereas a user's skeleton on its own or trainer's skeleton with the trainer video offered limited benefits. Our application demonstrates the potential to enhance learning physical movements from online videos and provides a basis for other guidance systems to design suitable visualizations.