CVM 2025 Conference Program
- Please note:
- The workshop on Day 0 is free, but a registration is required to attend the subsequent events.
- If not specified, all events are held at LT-J (near Lift 33), HKUST (referring to the map at Visitor Info).
- On-site registration will be available on the daytime (around 9:00am-5:00pm) of the entire conference, outside LT-J.
Day 0 | Friday, April 18, 2025 (Hong Kong CVM 2025 Workshop) |
09:30 - 09:35 | Opening Remarks by Prof. Hongbo FU (HKUST) |
09:35 - 10:55 | Session 1 (20 minuets per talk) |
Prof. Ziqi Wang (HKUST) Robotic Assembly Planning via Reinforcement learning |
|
Prof. Yuan Liu (HKUST) Incorporating CG pipeline with diffusion models for video generation |
|
Prof. Shenghua Gao (HKU) Geometry-aware 3D Reconstruction and Generation |
|
Prof. Long Chen (HKUST) Narrowing the Gaps: Towards Real-World Multimodal Reasoning & Generation Models |
|
11:00 - 12:20 | Session 2 (20 minuets per talk) |
Prof. Bo Yang (PolyU) 3D Physics Learning |
|
Prof. Shengdong Zhao (CityU) Heads-up Computing: Towards the Next Interaction Paradigm |
|
Prof. Xiaoyu Zhang (CityU) Visual Analytics for Computer Vision Model Validation |
|
Prof. Liwei Wang (CUHK) Learning from Videos to 3D LLMs |
|
12:20 - 14:00 | Lunch at China Garden (G/F) (by invitation only) |
14:00 - 15:20 | Session 3 (20 minuets per talk) |
Prof. Xiangyu Yue (CUHK) Towards Unified Multimodal Learning |
|
Prof. Yifan Peng (HKU) Towards Domain-specific Computational Imaging Systems: When Optics Meets Algorithms |
|
Prof. Dan Xu (HKUST) Open-World Perception, Modeling, and Editable Generation |
|
Prof. Hengshuang Zhao (HKU) Intelligent Visual Spatial Understanding and Reasoning |
|
15:25 - 16:25 | Session 4 (20 minuets per talk) |
Prof. Liangqiong Qu (HKU) Advancing federated learning via heterogeneity Evaluation, optimization, and privacy preservation |
|
Prof. Xiaojuan Qi (HKU) Learning 3D/4D Representations from Videos |
|
Prof. Tianfan Xue (CUHK) Computational photography in the age of foundation and generative models |
|
16:30 - 17:50 | Session 5 (20 minuets per talk) |
Prof. Xihui Liu (HKU) From Diffusion to Autoregression for Visual Generation |
|
Prof. Zhen Li (CUHK-SZ) Multimodal 3D Perception and Reasoning |
|
Prof. Zeyu Wang (HKUST-GZ) Toward Synergistic Human-AI Content Creation |
|
Prof. Junhui Hou (CityU) Dynamic 3D Content Reconstruction and Generation |
|
Day 1 | Saturday, April 19, 2025 |
09:00 - 09:20 | Opening Session (with HKUST AI Film Festival), located at Shaw Auditorim |
09:20 - 10:20 | Keynote Speech I - Prof. Maneesh Agrawala (with HKUST AI Film Festival), located at Shaw Auditorim |
10:30 - 10:50 | Tea Break |
10:50 - 12:10 | Paper Session 1: Geometric and Texture Reconstruction (Chair: Xiao-Ming Fu) |
Yu Chen, Hongwei Lin, Yifan Xing Human Perception Faithful Curve Reconstruction Based on Persistent Homology and Principal Curve |
|
Haichuan Song, Xinyi Chen FEDNet: A Feature-Enhanced Diffusion Network for Efficient and Universal Texture Synthesis |
|
Hongxiang Huang, Guoyuan An, Jingzhen Lan, Lingfei Wang, Rui Wang, Yuchi Huo Ultra-High Resolution Facial Texture Reconstruction from a Single Image |
|
Alakh Aggarwal, Ningna Wang, Xiaohu Guo TexHOI: Reconstructing Textures of 3D Unknown Objects in Monocular Hand-Object Interaction Scenes |
|
Ziqiang Dang, Wenqi Dong, Zesong Yang, Bangbang Yang, Liang Li, Yuewen Ma, Zhaopeng Cui TexPro: Text-guided PBR Texturing with Procedural Material Modeling |
|
12:10 - 13:30 | Lunch |
13:30 - 14:50 | Paper Session 2: Rendering (Chair: Lin Gao) |
Shi Mao, Chenming Wu, Zhelun Shen, Yifan Wang, Dayan Wu, Liangjun Zhang NeuS-PIR: Learning Relightable Neural Surface using Pre-Integrated Rendering |
|
Xiaowei Song, Ju Zheng, Shiran Yuan, Huan-ang Gao, Jingwei Zhao, Xiang He, Weihao Gu, Zhicheng Wang, Hao Zhao SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing |
|
Dongyu Chen, Haoxiang Chen, Qunce Xu, Tai-Jiang Mu RS-SpecSDF: Reflection-Supervised Surface Reconstruction and Material Estimation for Specular Indoor Scenes |
|
Qi-Yuan Feng, Hao-Xiang Chen, Qun-Ce Xu, Tai-Jiang Mu SLS4D: Sparse Latent Space for 4D Novel View Synthesis (invited TVCG paper presentation) |
|
Chenhui Wang, Jianyang Zhang, Chen Li, Changbo Wang DC-APIC: A Decomposed Compatible Affine Particle in Cell Transfer Scheme for Non-sticky Solid-Fluid Interactions in MPM |
|
14:50 - 15:40 | Paper Session 3: 3D Generation (Chair: Zhonggui Chen) |
Rengan Xie, Wenting Zheng, Kai Huang, Yizheng Chen, Qi Wang, Qi Ye, Wei Chen, Yuchi Huo LDM: Large Tensorial SDF Model for Textured Mesh Generation [supplementary] |
|
Chen Wang, Guangshun Wei, James Kit Hon Tsoi, Zhiming Cui, Shuyi Lu, Zhenpeng Liu, Yuanfeng Zhou Diff-OSGN: Diffusion-based Occlusal Surface Generation Network with Geometric Constraints |
|
Zhicong Tang, Shuyang Gu, Chunyu Wang, Ting Zhang, Jianmin Bao, Dong Chen, Baining Guo VolumeDiffusion: Feed-forward Text-to-3D Generation with Efficient Volumetric Encoder |
|
15:40 -16:00 | Tea Break |
16:00 - 17:30 | Poster Session 1: Detection, Segmentation and Medical Imaging |
Zhiwei Dong, Genji Yuan, Jinjiang Li AGTCNet: Hybrid Network Based on AGT and Curvature Information for Skin Lesion Detection |
|
Xin Chi, Yu Sun, Yingjun Zhao, Donghua Lu, Jun Yang, Yiting Zhang A Comprehensive Framework for Fine-Grained Object Recognition in Remote Sensing |
|
Jiayong Zhu, Tao Zhang SEA-Net: A Severity-Aware Network with Visual Prompt Tuning for Underwater Semantic Segmentation |
|
Tingwei Wen, Yao Lu, Xiaosheng Chen, Xinhai Lu, Guangming Lu Among General Spine Segmentation with Multi-scale and Discriminate Feature Fusion |
|
Yiquan Wu, Zhongtian Wang, You Wu, Ling Huang, Hui Zhou, Shuiwang Li Towards Reflected Object Detection: A Benchmark |
|
Yujie Liu, Zhonghao Du, Xuanting Li, Zongmin Li, Jiayue Fan, Chaozhi Yang SSCL: A Spatial-Spectral and Commonality Learning Network for Semi-Supervised Medical Image Segmentation |
|
Lisha Cui, Helong Jiao, Tengyue Liu, Chunyan Niu, Ming Ma, Xiaoheng Jiang, Mingliang Xu LAGNet: A Location-Aware Guidance Network for Weak and Strip Defect Detection |
|
Di Zhou, Jiahui Li, Haiying Wang, Matthew Burns, Meng Liu Consensus-aware Balance Learning for Sexually Suggestive Video Classification |
|
Hanli Zhao, Binhao Wang, Wanglong Lu, Juncong Lin Degradation-Aware Frequency-Separated Transformer for Blind Super-Resolution |
|
Fuxian Sui, Hua Wang, Fan Zhang A Multiscale Edge-Guided Polynomial Approximation Network for Medical Image Segmentation |
|
Shiwei He, Yingjuan Jia, Hanpu Wang, Xinyu Liu, Jianmeng Zhou, Huijie Gu, Mengyan Li, Tong Chen Weighted Spatiotemporal Feature and Multi-task Learning for Masked Facial Expression Recognition |
|
Junsheng Chang, Qin Shi, Yijun Zhang, Zongtang Hu, Xulun Ye MAAU-UIE : Multiple Attention Aggregation U-Net for Underwater Image Enhancement |
|
Jinglong Tian, Tianze Zhao, Zhijun Fan, Linlin Shen, Jieyao Wei, Qiumei Pu MANet-CycleGAN: An Unsupervised LDCT Image Denoising Method Based on Channel Attention and Multi-Scale Features |
|
Zhou Yang, Hua Wang, Fan Zhang HIFNet: Medical Image Segmentation Network Utilizing Hierarchical Attention Feature Fusion |
|
Ke Xu, Min Li, Guangjian Liu, Chen Chen, Cheng Chen, Enguang Zuo, Xiaoyi Lv MBGNet: Mamba-Based Boundary-Guided Multimodal Medical Image Segmentation Network |
|
Longtao Chen, Jinjie Zheng, Fenglei Xu, Jing Lou, Huanqiang Zeng MSD: Mask-Guided and Semantic-Guided Diffusion-Based Framework for Stone Surface Defect Detection |
|
Wenzhe Meng, Xiaoliang Zhu, Yanxiang Li YNet: medical image segmentation model based on wavelet transform boundary enhancement |
|
Qichang Wang, Ruixia Liu A New Heterogeneous Mixture of Experts Model for Deepfake Detection |
|
Xinyu Yang, Xiaochen Ma, Xuekang Zhu, Bo Du, Lei Su, Bingkui Tong, Zeyu Lei, Jizhe Zhou M3: Manipulation Mask Manufacturer for Arbitrary-Scale Super-Resolution Mask |
|
Xinrong Hu, Chao Fang, Yu Chen, Kai Yang, Chun-Mei Feng, Ping Li Agent-Conditioned Multi-Contrast MRI Super-Resolution for Cross-Subject |
|
Xin Feng, Jie Wang, Siping Wang, Jiehui Zhang LightStar-Net: A Pseudo-Raw Space Enhancement for Efficient Low-Light Object Detection |
|
Zunwang Ke, YinFeng Wang, Run Guo, Minghua Du, Ji-Sheng Zhou, Gang Wang, Yugui Zhang An Effective Algorithm for Skin Disease Segmentation Combining inter-channel Features and Spatial Feature Enhancement |
|
Haodong Li, Haicheng Qu DASSF: Dynamic-Attention Scale-Sequence Fusion for Aerial Object Detection |
|
18:20 - 19:20 | AI Film Festival Screening, located at Shaw Auditorim |
19:30 - 21:00 | Reception (with HKUST AI Film Festival), located at Shaw Auditorim |
Day 2 | Sunday, April 20, 2025 |
09:00 - 10:00 | Keynote Speech II - Inference-time Guided Generation Using Diffusion and Flow Models (by Prof. Minhyuk Sung) |
10:00 - 10:20 | Tea Break |
10:20 - 11:30 | Paper Session 4: Image/video Enhancement (Chair: Xuequan Lu) |
Yong Liu, Qingji Dong, Chao Zhu, Yu Guo, Fei Wang Towards Real-world Image Dehazing: A Tailored Dehazing Method and A High-Quality Dataset |
|
Ji-Wei Wang, Li-Yong Shen Temporal-Spatial Fusion Transformer for Video Demoiréing |
|
Yue Zhao, Zhonggui Chen, Juan Cao Palette-based Color Transfer for Images and Videos |
|
Simin Kou, Fang-Lue Zhang, Jakob Nazarenus, Reinhard Koch, Neil A. Dodgson OmniPlane: A Recolorable Representation for Dynamic Scenes in Omnidirectional Videos (invited TVCG paper presentation) |
|
11:30 - 13:10 | Lunch |
13:10 - 14:30 | Paper Session 5: Multimedia Generation (Chair: Deng Yu) |
Sen Peng, Weixing Xie, Zilong Wang, Xiaohu Guo, Zhonggui Chen, Baorong Yang, Xiao Dong RMAvatar: Photorealistic Human Avatar Reconstruction from Monocular Video Based on Rectified Mesh-embedded Gaussians |
|
Chenxu Zhang, Chao Wang, Jianfeng Zhang, Hongyi Xu, Guoxian Song, You Xie, Linjie Luo, Yapeng Tian, Jiashi Feng, Xiaohu Guo MagicTalk: Implicit and Explicit Correlation Learning for Diffusion-based Emotional Talking Face Generation |
|
Boyao Ma, Yuanping Cao, Lei Zhang Decoupled Two-Stage Talking Head Generation via Gaussian-Landmark-Based Neural Radiance Fields |
|
Zian Wang, Shihao Zou, Shiyao Yu, Mingyuan Zhang, Chao Dong Semantics-Aware Human Motion Generation from Audio Instructions |
|
Xufei Guo, Xiao Dong, Juan Cao, Zhonggui Chen CADTrans: A Code Tree-Guided CAD Generative Transformer Model with Regularized Discrete Codebooks |
|
14:30 - 14:50 | Tea Break |
14:50 - 15:30 | Paper Session 6: Action Analysis (Chair: Pengfei Xu) |
Songmiao Wang, Ruize Han, Wei Feng Concept-Guided Open-Vocabulary Temporal Action Detection |
|
Yuqing Zhang, Chen Pang, Pei Geng, Xuequan Lu, Lei Lyu Multi-scale adaptive large kernel graph convolutional network based on skeleton-based recognition |
|
15:30 - 17:00 | Poster Session 2: Generative Models, 3D and Geometry |
Jingze Chen, Lei Li, Zerui Tang, Qiqin Lin, Junfeng Yao CMU-Flownet: Exploring Point Cloud Scene Flow Estimation in Occluded Scenario |
|
Ye Wang, Ruiqi Liu, Zili Yi, Tieru Wu, Rui Ma SingleDream: Attribute-Driven T2I Customization from a Single Reference Image |
|
Zhikun Wen, Honghua Chen, Zhe Zhu, Zeyong Wei, Liangliang Nan, Mingqiang Wei CosCAD: Cross-Modal CAD Model Retrieval and Pose Alignment from a Single Image |
|
Kangneng Zhou, Yaxing Wang, Shuang Song, Jie Zhang, Ping Li 3DFaceController: Region-Controllable Face Synthesis via Decomposed and Recomposed Neural Radiance Fields |
|
Pengfei Deng, Tianjiao Zhang, Weize Quan, Hanyu Wang, Qinglin Lu, Zhifeng Li, Dong-Ming Yan Concept-Edge Fusion: Background Generation for Product Presentation Based on Text-to-Image Model |
|
Yishuo Fei, Chao Chen, Haipeng Liao, Mo Chen, Yuhui Yang, Dongming Lu High-Quality and Efficient Inverse Rendering for Geometry, Material, and Illumination Reconstruction |
|
Ran Zuo, Haoxiang Hu, Xiaoming Deng, Yaokun Li, Yu-Kun Lai, Cuixia Ma, Yong-Jin Liu, Hongan Wang Sketch-Guided Scene-level Image Editing with Diffusion Models |
|
Hongyu Chen, Xiao-Diao Chen An efficient and robust tracing method based on matrix representation for surface-surface intersection |
|
Hao Yu, Ruian Wang, Longdu Liu, Shuangmin Chen, Shiqing Xin, Zhenyu Shu, Changhe Tu Completing Dental Models While Preserving Crown Geometry and Meshing Topology |
|
Ziqi Zeng, Chen Zhao, Weiling Cai, Yuqing Guo Semantic-guided Coarse-to-Fine Diffusion Model for Self-supervised Image Shadow Removal |
|
Yujing Sun, Caiyi Sun, Yuan Liu, Yuexin Ma, Siu Ming Yiu Extreme Two-View Geometry From Object Poses with Diffusion Models |
|
Yu-Jie Yuan, Leif Kobbelt, Jie Yang, Yu-Kun Lai, Lin Gao TPD-NeRF: Temporally Progressive Reconstruction of Dynamic Neural Radiance Fields from Monocular Video [supplementary] |
|
Longdu Liu, Hao Yu, Shiqing Xin, Shuangmin Chen, Hongwei Lin, Wenping Wang, Changhe Tu Direct Extraction of High-Quality and Feature-Preserving Triangle Meshes from Signed Distance Functions |
|
Xiangyu Su, Sida Peng, Oliver van Kaick, Hui Huang, Ruizhen Hu MTScan: Material Transfer from Partial Scans to CAD models |
|
Qifeng Chen, Kai Huang, Yuchi Huo, Qi Wang, Wenting Zheng, Rong Li, Rengan Xie HR Human: Modeling Human Avatars with Triangular Mesh and High-Resolution Textures from Videos |
|
Xinqi Liu, Chenming Wu VGA: Reconstructing Vivid 3D Gaussian Avatars from Monocular Videos |
|
Shan Yue, Hai Huang, Zhenqi Tang, Yutong Zheng, Zhou Fang TCDNet: Texture and Color Dynamic Network for Image Harmonization |
|
Fuyang Liu, Jianjun Li Unsupervised Monocular Depth Estimation for Foggy Images with Domain Separation and Self-depth Domain Conversion |
|
Qingyi Zhu, Ruochen Jin, Zhiwei Zhang, Yishen Xue, Xin Tan, Lizhuang Ma TAD: A plug-and-play Task Arithmetic approach for augmenting Diffusion models |
|
Yu Liu, Fatimah Khalid, Cunrui Wang, Mas Rina Mustaffa, Azreen Azman DiffVecFont: Fusing Dual-Mode Reconstruction Vector Fonts via Masked Diffusion Transformers |
|
Zhenzhen Xiao, Heng Liu, Bingwen Hu Unwarping Screen Content Images via Structure-texture Enhancement Network and Transformation Self-estimation |
|
Qun-Ce Xu, Yan-Pei Cao, Weihao Cheng, Tai-Jiang Mu, Ying Shan, Yong-Liang Yang, Shi-Min Hu High-accuracy Fractured Object Reassembly under Arbitrary Poses |
|
17:20 | Shuttle Bus Departure to the Banquet |
18:30 - 20:00 | Conference Banquet |
Day 3 | Monday, April 21, 2025 |
09:00 - 10:00 | Keynote Speech III - Simulate Everything, Everywhere, All At Once (by Prof. Eitan Grinspun) |
10:00 - 10:20 | Tea Break |
10:20 - 11:40 | Paper Session 7: Geometry Processing (Chair: Dong-Ming Yan) |
Di Shao, Yaping Jing, Xinkui Zhao, Shasha Mao, Lei Lyu, Xiao Liu, Xuequan Lu DS-MAE: Dual-Siamese Masked Autoencoders for Point Cloud Analysis |
|
Ao Zhang, Qing Fang, Peng Zhou, Xiao-Ming Fu Topology-Controlled Laplace-Beltrami Operator on Point Clouds Based on Persistent Homology |
|
Yuan-Yuan Cheng, Qing Fang, Ligang Liu, Xiao-Ming Fu Developable approximation via Isomap on Gauss image |
|
Gaoyang Zhang, Yingxi Chen, Hanchao Li, Xinguo Liu Efficient and Structure-Aware 3D Reconstruction via Differentiable Primitives Abstraction |
|
Jiachen Liu, Yuan Xue, Haomiao Ni, Rui Yu, Zihan Zhou, Sharon X. Huang Computer-Aided Layout Generation for Building Design: A Review |
|
11:40 - 13:00 | Lunch |
13:00 - 14:10 | Paper Session 8: Optimization and Applicaiton (Chair: Tai-Jiang Mu) |
Gang Xu, Haoyu Liu, Biao Leng, Zhang Xiong ImVoxelENet: Image to Voxels Epipolar Transformer for Multi-View RGB-based 3D Object Detection |
|
Yanchao Bi, Yang Ning, Xiushan Nie, Xiankai Lu, Ruiheng Zhang, Huanlong Zhang FGHDet: Delving Into Fine-grained Features With Head Selection for UAV Object Detection |
|
Siying Huang, Xin Yang, Zhengda Lu, Hongxing Qin, Huaiwen Zhang, Yiqun Wang L2-GNN: Graph Neural Networks with Fast Spectral Filters Using Twice Linear Parameterization |
|
Zihan Zhou, Jiacheng Pan, Xumeng Wang, Dongming Han, Fangzhou Guo, Minfeng Zhu, Wei Chen A Summarization-Based Pattern-Aware Matrix Reordering Approach [supplementary] |
|
14:10 - 14:30 | Tea Break |
14:30 - 16:00 | Poster Session 3: Multimodal Learning, Unsupervised Methods, and Applications |
Wenbin Wu, Zhiwei Zhang, Xin Tan, Zhizhong Zhang, Lizhuang Ma DepthFisheye: Efficient Fine-Tuning of Depth Estimation Models for Fisheye Cameras |
|
Yongbiao Gao, Xiangcheng Sun, Guohua Lv, Deng Yu, Sijie Niu Reinforced Label Denoising for Weakly-Supervised Audio-Visual Video Parsing |
|
Jiangnan Xia, Zhiyuan Zhang, Yanyin Guo, Qilong Wu, Yi Li, Jianghan Cheng, Junwei Li Bridging the Modality Gap: Advancing Multimodal Human Pose Estimation with Modality-Adaptive Pose Estimator and Novel Benchmark Datasets |
|
Xiaole Zhu, Zongtao Duan, Junchen Huang, Xing Sheng Momentum-Based Uni-Modal Soft-Label Alignment and Multi-Modal Latent Projection Networks for Optimizing Image-Text Retrieval |
|
Keyang Lin, Zhijun Fang, Sicong Zang, Hang Wu Learning Adaptive Basis Fonts to Fuse Content Features for Few-shot Font Generation |
|
Xiaoyu Guan, Yihao Li, Tianyu Huang TaiCrowd: A High-Performance Simulation Framework for Massive Crowd |
|
Wei Ge, Yongwei Nie, Fei Ma, Keke Tang, Fei Richard Yu, Hongmin Cai, Ping Li Training-Free Language-Guided Video Summarization via Multi-Grained Saliency Scoring |
|
Benchao Li, Yun Zou, Ruisheng Ran MCFG with GUMAP: A Simple and Effective Clustering Framework on Grassmann Manifold |
|
Yun Zou, Benchao Li, Ruisheng Ran Joint UMAP for Visualization of Time-Dependent Data |
|
Chengrong Yang, Qiwen Jin, Xiaoguo Zhang, Yujue Zhou Feature Disentanglement and Fusion Model for Multi-Source Domain Adaptation with Domain-Specific Features |
|
Hao Tong, Jiawei Liu, Yong Wu, Guozhi Zhao, Fanrui Zhang, Zheng-Jun Zha Multi-Granularity and Multi-Modal Prompt Learning for Person Re-Identification |
|
Lu Xu, Shuaixin Li, Xin Zhou, Xiaozhou Zhu, Wen Yao Local and Global Feature Cross-attention Multimodal Place Recognition |
|
Hongchao Zhong, Li Yu, Longkun Zou, Ke Chen Unsupervised Domain Adaptation on Point Cloud Classification via Imposing Structural Manifolds into Representation Space |
|
Kailang Hu, Yixiao Lu, Huibing Li, Xuan Song A Trademark Retrieval Method Based on Self-Supervised Learning |
|
Shu Liu, Melikamu Liyih Sinishaw, Luo Zheng DIMATrack: Dimension Aware Data Association for Multi-Object Tracking |
|
Qinghua Song, Xiaolei Wang Efficient Transformer Network for Visible and Ultraviolet Object Tracking |
|
Zheng Zhang, RuiQing Yang, ChuanLei Zhang IML-CMM - A Multimodal Sentiment Analysis Framework Integrating Intra-Modal Learning and Cross-Modal Mixup Enhancement |
|
Junjiang Liu, Dandan Sun, Hailun Xia, Jiangtao Bai, Xinyue Fan Weaken Noisy Feature: Boosting Semi-Supervised Learning by Noise Estimation |
|
Ruizhong Du, Luman Zhao, Mingyue Li, Yidan Li, Shenyu Li, Caixia Ma ADMMOA: Attribute-Driven Multimodal Optimization for Face Recognition Adversarial Attacks |
|
Mingming Li, Fei Wu, Yinjie Wang LightGR-Transformer: Light Grouped Residual Transformer for Multispectral Object Detection |
|
Weiye Peng, Shenghua Zhong Multi-Dimension Full Scene Integrated Visual Emotion Analysis Network |
|
Shan Huang, Wenhua Qian Gap-KD: Bridging the Significant Capacity Gap Between Teacher and Student Model |
|
16:00 - 16:30 | Closing Session |
Keynote Speakers
Prof. Maneesh Agrawala, Stanford University, USA
Title:
Beyond Unpredictable Black Boxes: Designing Generative AI For Iterative Refinement
Abstract:
Human creation of high-quality content often requires significant iteration. People produce a coarse initial draft, and refine it step-by-step towards a final result. While modern generative AI tools are capable of producing surprisingly high-quality content from simple text prompts, they do not support this iterative refinement workflow. Instead today's AI tools are black boxes, making it impossible for users to build a mental/conceptual model that can predict how an input prompt will be transmuted into output content. The lack of predicatability forces users to rely on iterative trial-and-error, repeatedly crafting a prompt, using the AI to generate a result, and then adjusting the prompt to try again. In this talk I'll outline some features generative AI tools should provide to support iterative refinement workflows rather than iterative trial-and-error. These features include hierarchical decomposition of the creation task and consistency of the output content from iteration to iteration. Finally I'll suggest some approaches we might use to build generative AI tools that provide such features and demonstrate a few implementations of these ideas that we have developed in our lab at Stanford.
Speaker's Biography:
Maneesh Agrawala is the Forest Baskett Professor of Computer Science and Director of the Brown Institute for Media Innovation at Stanford University. He works on computer graphics, human computer interaction and visualization. His focus is on investigating how cognitive design principles can be used to improve the effectiveness of audio/visual media. The goals of this work are to discover the design principles and then instantiate them as constraints and controls in generative AI tools. Honors include an Okawa Foundation Research Grant (2006), an Alfred P. Sloan Foundation Fellowship (2007), an NSF CAREER Award (2007), a SIGGRAPH Significant New Researcher Award (2008), a MacArthur Foundation Fellowship (2009), an Allen Distinguished Investigator Award (2014) and induction into the SIGCHI Academy (2021). He was named an ACM Fellow in 2022.
Prof. Eitan Grinspun, University of Toronto, Canada
Title:
Simulate Everything, Everywhere, All At Once
Abstract:
Reduced order modelng (ROM) has long promised the ability to simulate complex dynamics at lightning speed—at the cost of specialization. Traditional ROMs are typically tied to a specific scenario, geometry, and discretization, making them brittle in the face of changing applications, shapes, or numerical methods.
If we could lift these restrictions, then we could unlock the potential of dramatically more expressive ROMS that can accelerate a wide range simulations and applications. These ROMS could absorb data from disparate simulations—even those conducted on different discretizations—and generalize across diverse shapes.
I will present first steps toward these goals. Using neural fields to build continuous, differentiable representations of physical phenomena, we can learn a low-dimensional manifold of kinematic representations not tied to one shape or discretization. These new ROMS enable fast, accurate simulations that can train on—and generalize across—grids, meshes, and point clouds alike, and their generalization across shapes has exciting connections to spectral geometry processing. I believe that these approaches point to a new kind of simulation engine: one that is fast, general, and geometry-aware, bringing us one step closer to simulating everything, everywhere, all at once.
Speaker's Biography:
Eitan Grinspun is Associate Chair, Communications, Mentoring and Inclusion at the Department of Computer Science at the University of Toronto, where he is also a Professor of Computer Science and Mathematics. He was previously the Co-Director of the Columbia Computer Graphics Group at Columbia University (2004-2021), Professeur d'Université Invité at l'Université Pierre et Marie Curie in Paris (2009), a Research Scientist at New York University's Courant Institute (2003-2004), a graduate student at the California Institute of Technology (1997-2003), and an undergraduate in Engineering Science at the University of Toronto (1993-1997). He has been an NVIDIA Fellow (2001), Eberhart Distinguished Lecturer (2003), NSF CAREER Awardee (2007), Alfred P. Sloan Research Fellow (2010-2012), one of Popular Science magazine's "Brilliant Ten Scientists" (2011), and Fast Company magazine's "Most Creative People in Business" (2013). Technologies developed by his lab are used in products such as Adobe Photoshop & Illustrator, at major film studios, and in soft matter physics and engineering research. He has been profiled in The New York Times, Scientific American, New Scientist, and mentioned in Variety. His film credits include The Hobbit, Rise of the Planet of the Apes, and Steven Spielberg's The Adventures of Tintin.
Prof. Minhyuk Sung, KAIST, South Korea
Title:
Inference-time guided generation using diffusion and flow models
Abstract:
Recent breakthroughs in generative AI have transformed the creative process, making it easier than ever to generate realistic images and videos. While the quality of generated outputs has reached unprecedented levels of realism, the challenge now lies in improving controllability and alignment with user preferences. Although text-to-image generative models have become prevalent, text input alone is often insufficient to provide precise spatial or stylistic control. Traditionally, users have provided such inputs through direct interactions, such as mouse clicks, but supporting these traditional input methods has become increasingly challenging. Moreover, despite the widespread use of text-to-image generation, text-image alignment remains far from perfect, particularly for complex prompts. To enhance controllability and alignment with user intent, recent advancements in LLMs have shifted focus beyond scaling training data and model size to scaling inference-time computation, as demonstrated by the AGI-level performance of models like GPT-4o and DeepSeek. In this talk, I will discuss inference-time generation techniques for guided image and video generation, categorized into three main approaches. First, noise manipulation leverages the observation that adjusting noise during the denoising process influences the final output, enabling alignment with user-defined spatial guidance. Second, gradient-descent-based algorithms utilize the expectation of the final output at an intermediate denoising step, which can be interpreted through the lens of score distillation and combined with it. Lastly, particle sampling exploits the stochastic nature of the generative process, branching out the generation to scale up and search for the desired output, albeit at the cost of increased inference computation. I will explore their capabilities, limitations, and future directions.
Speaker's Biography:
Minhyuk Sung is an Associate Professor in the School of Computing at KAIST, affiliated with the Graduate Schools of AI and Metaverse. Previously, he was a Research Scientist at Adobe Research. He earned his Ph.D. from Stanford University under Prof. Leonidas J. Guibas. His research focuses on generating, manipulating, and analyzing various visual data, including images, videos, and 3D data. He has served on program committees for SIGGRAPH Asia (2022-2025), Eurographics (2022, 2024-2025), Pacific Graphics (2023, 2025), ICCV (2025), ICLR (2025), and AAAI (2023, 2024). He received the Asia Graphics Young Researcher Award in 2024.