Overview
The emergence of multimodal foundation models (MLLMs) has catalyzed a paradigm shift in human-centered vision. While traditional systems often struggle with the open-world complexity of human behavior, the integration of large-scale pre-training with cross-modal reasoning offers paths toward truly adaptive intelligence. This workshop addresses the critical transition from constrained benchmarks to in-the-wild deployment, where visual intelligence must navigate the ambiguity of human intent and the noise of unstructured environments.
The Human-Centered Multimodal Intelligence in the Wild (HCMIW) workshop bridges the gap between theoretical MLLMs and practical, human-aligned applications. We focus on the synergy between robust perception and sophisticated reasoning, aiming to redefine how vision systems interpret human presence across heterogeneous sensors and dynamic contexts.
To address these challenges, HCMIW invites contributions across five critical axes:
- MLLMs for Human-Centric Tasks: Novel paradigms for detection, tracking, and activity understanding, with emphasis on efficient adaptation and from-scratch training tailored for real-world constraints.
- Perception in Adverse Conditions: Advancing robustness against in-the-wild variables, including extreme lighting, rapid motion, severe occlusions, and complex human-object interactions.
- Heterogeneous Sensor Fusion: Beyond RGB—leveraging infrared, event-based cameras, LiDAR, and thermal sensing to build multi-spectral understanding of human activity.
- Generalization & Domain Transfer: Strategies for bridging the gap between massive general-purpose datasets and specialized, high-stakes human-centric domains.
- Next-Generation Benchmarking: Developing wild-first datasets and evaluation metrics that prioritize human-facing performance, safety, and reliability over static accuracy.
Topics of Interest
We welcome submissions and discussions related to, but not limited to, the following themes:
- Foundation models for human-centric perception, understanding, and reasoning.
- Multimodal learning for human behavior understanding, interaction analysis, and activity recognition.
- Perception in adverse conditions, including low light, occlusion, motion blur, and sensor noise.
- Cross-modal grounding across RGB, thermal, infrared, depth, event cameras, LiDAR, and audio.
- Generalization, adaptation, robustness, and reliability in real-world human-centered systems.
- Benchmarks, datasets, evaluation protocols, and deployment-oriented analysis for in-the-wild scenarios.
Invited Speakers
Zicheng Liu
Ronald Poppe
Kim Chang-Su
Program
We propose a half-day onsite workshop featuring a balanced program of invited talks, contributed papers, and interactive discussions. The detailed schedule will be announced at a later date.
Call for Papers
Important Dates
Submission Instructions
We are following the ECCV paper format: ECCV 2026 Author Guidelines.
LaTeX/Word templates: ECCV 2026 Paper Template.
The page limit includes figures and tables in the ECCV style. Additional pages containing only cited references are allowed. Papers with more than 14 pages, excluding references, will be rejected without review.
A paper should be submitted using the above templates. The length should match that intended for final publication.
Blind review: We adopt double-blind review for this workshop. Submitted papers and supplementary materials should not reveal any information about the author.
Dual submission: We do not accept paper submissions that have been published, including at the ECCV main conference, or are under review for other conferences or workshops. Accepted papers are expected to be published in ECCV proceedings.
Submission Website
Submission site: https://openreview.net/group?id=thecvf.com/ECCV/2026/Workshop/HCMIW
Organizers
Zhigang Tu
Aoran Xiao
Angela Yao
Jürgen Gall
Hossein Rahmani
Jun Liu