CVM 2025 Conference Program

The workshop on Day 0 is free, but a registration is required to attend the subsequent events.
If not specified, all events are held at LT-J (near Lift 33), HKUST (referring to the map at Visitor Info).
On-site registration will be available on the daytime (around 9:00am-5:00pm) of the entire conference, outside LT-J.

Day 0	Friday, April 18, 2025 (Hong Kong CVM 2025 Workshop)
09:30 - 09:35	Opening Remarks by Prof. Hongbo FU (HKUST)
09:35 - 10:55	Session 1 (20 minuets per talk)
	Prof. Ziqi Wang (HKUST) Robotic Assembly Planning via Reinforcement learning
	Prof. Yuan Liu (HKUST) Incorporating CG pipeline with diffusion models for video generation
	Prof. Shenghua Gao (HKU) Geometry-aware 3D Reconstruction and Generation
	Prof. Long Chen (HKUST) Narrowing the Gaps: Towards Real-World Multimodal Reasoning & Generation Models
11:00 - 12:20	Session 2 (20 minuets per talk)
	Prof. Bo Yang (PolyU) 3D Physics Learning
	Prof. Shengdong Zhao (CityU) Heads-up Computing: Towards the Next Interaction Paradigm
	Prof. Xiaoyu Zhang (CityU) Visual Analytics for Computer Vision Model Validation
	Prof. Liwei Wang (CUHK) Learning from Videos to 3D LLMs
12:20 - 14:00	Lunch at China Garden (G/F) (by invitation only)
14:00 - 15:20	Session 3 (20 minuets per talk)
	Prof. Xiangyu Yue (CUHK) Towards Unified Multimodal Learning
	Prof. Yifan Peng (HKU) Towards Domain-specific Computational Imaging Systems: When Optics Meets Algorithms
	Prof. Dan Xu (HKUST) Open-World Perception, Modeling, and Editable Generation
	Prof. Hengshuang Zhao (HKU) Intelligent Visual Spatial Understanding and Reasoning
15:25 - 16:25	Session 4 (20 minuets per talk)
	Prof. Liangqiong Qu (HKU) Advancing federated learning via heterogeneity Evaluation, optimization, and privacy preservation
	Prof. Xiaojuan Qi (HKU) Learning 3D/4D Representations from Videos
	Prof. Tianfan Xue (CUHK) Computational photography in the age of foundation and generative models
16:30 - 17:50	Session 5 (20 minuets per talk)
	Prof. Xihui Liu (HKU) From Diffusion to Autoregression for Visual Generation
	Prof. Zhen Li (CUHK-SZ) Multimodal 3D Perception and Reasoning
	Prof. Zeyu Wang (HKUST-GZ) Toward Synergistic Human-AI Content Creation
	Prof. Junhui Hou (CityU) Dynamic 3D Content Reconstruction and Generation
Day 1	Saturday, April 19, 2025
09:00 - 09:20	Opening Session (with HKUST AI Film Festival), located at Shaw Auditorim
09:20 - 10:20	Keynote Speech I - Prof. Maneesh Agrawala (with HKUST AI Film Festival), located at Shaw Auditorim
10:30 - 10:50	Tea Break
10:50 - 12:10	Paper Session 1: Geometric and Texture Reconstruction (Chair: Xiao-Ming Fu)
	Yu Chen, Hongwei Lin, Yifan Xing Human Perception Faithful Curve Reconstruction Based on Persistent Homology and Principal Curve
	Haichuan Song, Xinyi Chen FEDNet: A Feature-Enhanced Diffusion Network for Efficient and Universal Texture Synthesis
	Hongxiang Huang, Guoyuan An, Jingzhen Lan, Lingfei Wang, Rui Wang, Yuchi Huo Ultra-High Resolution Facial Texture Reconstruction from a Single Image
	Alakh Aggarwal, Ningna Wang, Xiaohu Guo TexHOI: Reconstructing Textures of 3D Unknown Objects in Monocular Hand-Object Interaction Scenes
	Ziqiang Dang, Wenqi Dong, Zesong Yang, Bangbang Yang, Liang Li, Yuewen Ma, Zhaopeng Cui TexPro: Text-guided PBR Texturing with Procedural Material Modeling
12:10 - 13:30	Lunch
13:30 - 14:50	Paper Session 2: Rendering (Chair: Lin Gao)
	Shi Mao, Chenming Wu, Zhelun Shen, Yifan Wang, Dayan Wu, Liangjun Zhang NeuS-PIR: Learning Relightable Neural Surface using Pre-Integrated Rendering
	Xiaowei Song, Ju Zheng, Shiran Yuan, Huan-ang Gao, Jingwei Zhao, Xiang He, Weihao Gu, Zhicheng Wang, Hao Zhao SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing
	Dongyu Chen, Haoxiang Chen, Qunce Xu, Tai-Jiang Mu RS-SpecSDF: Reflection-Supervised Surface Reconstruction and Material Estimation for Specular Indoor Scenes
	Qi-Yuan Feng, Hao-Xiang Chen, Qun-Ce Xu, Tai-Jiang Mu SLS4D: Sparse Latent Space for 4D Novel View Synthesis (invited TVCG paper presentation)
	Chenhui Wang, Jianyang Zhang, Chen Li, Changbo Wang DC-APIC: A Decomposed Compatible Affine Particle in Cell Transfer Scheme for Non-sticky Solid-Fluid Interactions in MPM
14:50 - 15:40	Paper Session 3: 3D Generation (Chair: Zhonggui Chen)
	Rengan Xie, Wenting Zheng, Kai Huang, Yizheng Chen, Qi Wang, Qi Ye, Wei Chen, Yuchi Huo LDM: Large Tensorial SDF Model for Textured Mesh Generation [supplementary]
	Chen Wang, Guangshun Wei, James Kit Hon Tsoi, Zhiming Cui, Shuyi Lu, Zhenpeng Liu, Yuanfeng Zhou Diff-OSGN: Diffusion-based Occlusal Surface Generation Network with Geometric Constraints
	Zhicong Tang, Shuyang Gu, Chunyu Wang, Ting Zhang, Jianmin Bao, Dong Chen, Baining Guo VolumeDiffusion: Feed-forward Text-to-3D Generation with Efficient Volumetric Encoder
15:40 -16:00	Tea Break
16:00 - 17:30	Poster Session 1: Detection, Segmentation and Medical Imaging
	Zhiwei Dong, Genji Yuan, Jinjiang Li AGTCNet: Hybrid Network Based on AGT and Curvature Information for Skin Lesion Detection
	Xin Chi, Yu Sun, Yingjun Zhao, Donghua Lu, Jun Yang, Yiting Zhang A Comprehensive Framework for Fine-Grained Object Recognition in Remote Sensing
	Jiayong Zhu, Tao Zhang SEA-Net: A Severity-Aware Network with Visual Prompt Tuning for Underwater Semantic Segmentation
	Tingwei Wen, Yao Lu, Xiaosheng Chen, Xinhai Lu, Guangming Lu Among General Spine Segmentation with Multi-scale and Discriminate Feature Fusion
	Yiquan Wu, Zhongtian Wang, You Wu, Ling Huang, Hui Zhou, Shuiwang Li Towards Reflected Object Detection: A Benchmark
	Yujie Liu, Zhonghao Du, Xuanting Li, Zongmin Li, Jiayue Fan, Chaozhi Yang SSCL: A Spatial-Spectral and Commonality Learning Network for Semi-Supervised Medical Image Segmentation
	Lisha Cui, Helong Jiao, Tengyue Liu, Chunyan Niu, Ming Ma, Xiaoheng Jiang, Mingliang Xu LAGNet: A Location-Aware Guidance Network for Weak and Strip Defect Detection
	Di Zhou, Jiahui Li, Haiying Wang, Matthew Burns, Meng Liu Consensus-aware Balance Learning for Sexually Suggestive Video Classification
	Hanli Zhao, Binhao Wang, Wanglong Lu, Juncong Lin Degradation-Aware Frequency-Separated Transformer for Blind Super-Resolution
	Fuxian Sui, Hua Wang, Fan Zhang A Multiscale Edge-Guided Polynomial Approximation Network for Medical Image Segmentation
	Shiwei He, Yingjuan Jia, Hanpu Wang, Xinyu Liu, Jianmeng Zhou, Huijie Gu, Mengyan Li, Tong Chen Weighted Spatiotemporal Feature and Multi-task Learning for Masked Facial Expression Recognition
	Junsheng Chang, Qin Shi, Yijun Zhang, Zongtang Hu, Xulun Ye MAAU-UIE : Multiple Attention Aggregation U-Net for Underwater Image Enhancement
	Jinglong Tian, Tianze Zhao, Zhijun Fan, Linlin Shen, Jieyao Wei, Qiumei Pu MANet-CycleGAN: An Unsupervised LDCT Image Denoising Method Based on Channel Attention and Multi-Scale Features
	Zhou Yang, Hua Wang, Fan Zhang HIFNet: Medical Image Segmentation Network Utilizing Hierarchical Attention Feature Fusion
	Ke Xu, Min Li, Guangjian Liu, Chen Chen, Cheng Chen, Enguang Zuo, Xiaoyi Lv MBGNet: Mamba-Based Boundary-Guided Multimodal Medical Image Segmentation Network
	Longtao Chen, Jinjie Zheng, Fenglei Xu, Jing Lou, Huanqiang Zeng MSD: Mask-Guided and Semantic-Guided Diffusion-Based Framework for Stone Surface Defect Detection
	Wenzhe Meng, Xiaoliang Zhu, Yanxiang Li YNet: medical image segmentation model based on wavelet transform boundary enhancement
	Qichang Wang, Ruixia Liu A New Heterogeneous Mixture of Experts Model for Deepfake Detection
	Xinyu Yang, Xiaochen Ma, Xuekang Zhu, Bo Du, Lei Su, Bingkui Tong, Zeyu Lei, Jizhe Zhou M³: Manipulation Mask Manufacturer for Arbitrary-Scale Super-Resolution Mask
	Xinrong Hu, Chao Fang, Yu Chen, Kai Yang, Chun-Mei Feng, Ping Li Agent-Conditioned Multi-Contrast MRI Super-Resolution for Cross-Subject
	Xin Feng, Jie Wang, Siping Wang, Jiehui Zhang LightStar-Net: A Pseudo-Raw Space Enhancement for Efficient Low-Light Object Detection
	Zunwang Ke, YinFeng Wang, Run Guo, Minghua Du, Ji-Sheng Zhou, Gang Wang, Yugui Zhang An Effective Algorithm for Skin Disease Segmentation Combining inter-channel Features and Spatial Feature Enhancement
	Haodong Li, Haicheng Qu DASSF: Dynamic-Attention Scale-Sequence Fusion for Aerial Object Detection
18:20 - 19:20	AI Film Festival Screening, located at Shaw Auditorim
19:30 - 21:00	Reception (with HKUST AI Film Festival), located at Shaw Auditorim
Day 2	Sunday, April 20, 2025
09:00 - 10:00	Keynote Speech II - Inference-time Guided Generation Using Diffusion and Flow Models (by Prof. Minhyuk Sung)
10:00 - 10:20	Tea Break
10:20 - 11:30	Paper Session 4: Image/video Enhancement (Chair: Xuequan Lu)
	Yong Liu, Qingji Dong, Chao Zhu, Yu Guo, Fei Wang Towards Real-world Image Dehazing: A Tailored Dehazing Method and A High-Quality Dataset
	Ji-Wei Wang, Li-Yong Shen Temporal-Spatial Fusion Transformer for Video Demoiréing
	Yue Zhao, Zhonggui Chen, Juan Cao Palette-based Color Transfer for Images and Videos
	Simin Kou, Fang-Lue Zhang, Jakob Nazarenus, Reinhard Koch, Neil A. Dodgson OmniPlane: A Recolorable Representation for Dynamic Scenes in Omnidirectional Videos (invited TVCG paper presentation)
11:30 - 13:10	Lunch
13:10 - 14:30	Paper Session 5: Multimedia Generation (Chair: Deng Yu)
	Sen Peng, Weixing Xie, Zilong Wang, Xiaohu Guo, Zhonggui Chen, Baorong Yang, Xiao Dong RMAvatar: Photorealistic Human Avatar Reconstruction from Monocular Video Based on Rectified Mesh-embedded Gaussians
	Chenxu Zhang, Chao Wang, Jianfeng Zhang, Hongyi Xu, Guoxian Song, You Xie, Linjie Luo, Yapeng Tian, Jiashi Feng, Xiaohu Guo MagicTalk: Implicit and Explicit Correlation Learning for Diffusion-based Emotional Talking Face Generation
	Boyao Ma, Yuanping Cao, Lei Zhang Decoupled Two-Stage Talking Head Generation via Gaussian-Landmark-Based Neural Radiance Fields
	Zian Wang, Shihao Zou, Shiyao Yu, Mingyuan Zhang, Chao Dong Semantics-Aware Human Motion Generation from Audio Instructions
	Xufei Guo, Xiao Dong, Juan Cao, Zhonggui Chen CADTrans: A Code Tree-Guided CAD Generative Transformer Model with Regularized Discrete Codebooks
14:30 - 14:50	Tea Break
14:50 - 15:30	Paper Session 6: Action Analysis (Chair: Pengfei Xu)
	Songmiao Wang, Ruize Han, Wei Feng Concept-Guided Open-Vocabulary Temporal Action Detection
	Yuqing Zhang, Chen Pang, Pei Geng, Xuequan Lu, Lei Lyu Multi-scale adaptive large kernel graph convolutional network based on skeleton-based recognition
15:30 - 17:00	Poster Session 2: Generative Models, 3D and Geometry
	Jingze Chen, Lei Li, Zerui Tang, Qiqin Lin, Junfeng Yao CMU-Flownet: Exploring Point Cloud Scene Flow Estimation in Occluded Scenario
	Ye Wang, Ruiqi Liu, Zili Yi, Tieru Wu, Rui Ma SingleDream: Attribute-Driven T2I Customization from a Single Reference Image
	Zhikun Wen, Honghua Chen, Zhe Zhu, Zeyong Wei, Liangliang Nan, Mingqiang Wei CosCAD: Cross-Modal CAD Model Retrieval and Pose Alignment from a Single Image
	Kangneng Zhou, Yaxing Wang, Shuang Song, Jie Zhang, Ping Li 3DFaceController: Region-Controllable Face Synthesis via Decomposed and Recomposed Neural Radiance Fields
	Pengfei Deng, Tianjiao Zhang, Weize Quan, Hanyu Wang, Qinglin Lu, Zhifeng Li, Dong-Ming Yan Concept-Edge Fusion: Background Generation for Product Presentation Based on Text-to-Image Model
	Yishuo Fei, Chao Chen, Haipeng Liao, Mo Chen, Yuhui Yang, Dongming Lu High-Quality and Efficient Inverse Rendering for Geometry, Material, and Illumination Reconstruction
	Ran Zuo, Haoxiang Hu, Xiaoming Deng, Yaokun Li, Yu-Kun Lai, Cuixia Ma, Yong-Jin Liu, Hongan Wang Sketch-Guided Scene-level Image Editing with Diffusion Models
	Hongyu Chen, Xiao-Diao Chen An efficient and robust tracing method based on matrix representation for surface-surface intersection
	Hao Yu, Ruian Wang, Longdu Liu, Shuangmin Chen, Shiqing Xin, Zhenyu Shu, Changhe Tu Completing Dental Models While Preserving Crown Geometry and Meshing Topology
	Ziqi Zeng, Chen Zhao, Weiling Cai, Yuqing Guo Semantic-guided Coarse-to-Fine Diffusion Model for Self-supervised Image Shadow Removal
	Yujing Sun, Caiyi Sun, Yuan Liu, Yuexin Ma, Siu Ming Yiu Extreme Two-View Geometry From Object Poses with Diffusion Models
	Yu-Jie Yuan, Leif Kobbelt, Jie Yang, Yu-Kun Lai, Lin Gao TPD-NeRF: Temporally Progressive Reconstruction of Dynamic Neural Radiance Fields from Monocular Video [supplementary]
	Longdu Liu, Hao Yu, Shiqing Xin, Shuangmin Chen, Hongwei Lin, Wenping Wang, Changhe Tu Direct Extraction of High-Quality and Feature-Preserving Triangle Meshes from Signed Distance Functions
	Xiangyu Su, Sida Peng, Oliver van Kaick, Hui Huang, Ruizhen Hu MTScan: Material Transfer from Partial Scans to CAD models
	Qifeng Chen, Kai Huang, Yuchi Huo, Qi Wang, Wenting Zheng, Rong Li, Rengan Xie HR Human: Modeling Human Avatars with Triangular Mesh and High-Resolution Textures from Videos
	Xinqi Liu, Chenming Wu VGA: Reconstructing Vivid 3D Gaussian Avatars from Monocular Videos
	Shan Yue, Hai Huang, Zhenqi Tang, Yutong Zheng, Zhou Fang TCDNet: Texture and Color Dynamic Network for Image Harmonization
	Fuyang Liu, Jianjun Li Unsupervised Monocular Depth Estimation for Foggy Images with Domain Separation and Self-depth Domain Conversion
	Qingyi Zhu, Ruochen Jin, Zhiwei Zhang, Yishen Xue, Xin Tan, Lizhuang Ma TAD: A plug-and-play Task Arithmetic approach for augmenting Diffusion models
	Yu Liu, Fatimah Khalid, Cunrui Wang, Mas Rina Mustaffa, Azreen Azman DiffVecFont: Fusing Dual-Mode Reconstruction Vector Fonts via Masked Diffusion Transformers
	Zhenzhen Xiao, Heng Liu, Bingwen Hu Unwarping Screen Content Images via Structure-texture Enhancement Network and Transformation Self-estimation
	Qun-Ce Xu, Yan-Pei Cao, Weihao Cheng, Tai-Jiang Mu, Ying Shan, Yong-Liang Yang, Shi-Min Hu High-accuracy Fractured Object Reassembly under Arbitrary Poses
17:20	Shuttle Bus Departure to the Banquet
18:30 - 20:00	Conference Banquet
Day 3	Monday, April 21, 2025
09:00 - 10:00	Keynote Speech III - Simulate Everything, Everywhere, All At Once (by Prof. Eitan Grinspun)
10:00 - 10:20	Tea Break
10:20 - 11:40	Paper Session 7: Geometry Processing (Chair: Dong-Ming Yan)
	Di Shao, Yaping Jing, Xinkui Zhao, Shasha Mao, Lei Lyu, Xiao Liu, Xuequan Lu DS-MAE: Dual-Siamese Masked Autoencoders for Point Cloud Analysis
	Ao Zhang, Qing Fang, Peng Zhou, Xiao-Ming Fu Topology-Controlled Laplace-Beltrami Operator on Point Clouds Based on Persistent Homology
	Yuan-Yuan Cheng, Qing Fang, Ligang Liu, Xiao-Ming Fu Developable approximation via Isomap on Gauss image
	Gaoyang Zhang, Yingxi Chen, Hanchao Li, Xinguo Liu Efficient and Structure-Aware 3D Reconstruction via Differentiable Primitives Abstraction
	Jiachen Liu, Yuan Xue, Haomiao Ni, Rui Yu, Zihan Zhou, Sharon X. Huang Computer-Aided Layout Generation for Building Design: A Review
11:40 - 13:00	Lunch
13:00 - 14:10	Paper Session 8: Optimization and Applicaiton (Chair: Tai-Jiang Mu)
	Gang Xu, Haoyu Liu, Biao Leng, Zhang Xiong ImVoxelENet: Image to Voxels Epipolar Transformer for Multi-View RGB-based 3D Object Detection
	Yanchao Bi, Yang Ning, Xiushan Nie, Xiankai Lu, Ruiheng Zhang, Huanlong Zhang FGHDet: Delving Into Fine-grained Features With Head Selection for UAV Object Detection
	Siying Huang, Xin Yang, Zhengda Lu, Hongxing Qin, Huaiwen Zhang, Yiqun Wang L²-GNN: Graph Neural Networks with Fast Spectral Filters Using Twice Linear Parameterization
	Zihan Zhou, Jiacheng Pan, Xumeng Wang, Dongming Han, Fangzhou Guo, Minfeng Zhu, Wei Chen A Summarization-Based Pattern-Aware Matrix Reordering Approach [supplementary]
14:10 - 14:30	Tea Break
14:30 - 16:00	Poster Session 3: Multimodal Learning, Unsupervised Methods, and Applications
	Wenbin Wu, Zhiwei Zhang, Xin Tan, Zhizhong Zhang, Lizhuang Ma DepthFisheye: Efficient Fine-Tuning of Depth Estimation Models for Fisheye Cameras
	Yongbiao Gao, Xiangcheng Sun, Guohua Lv, Deng Yu, Sijie Niu Reinforced Label Denoising for Weakly-Supervised Audio-Visual Video Parsing
	Jiangnan Xia, Zhiyuan Zhang, Yanyin Guo, Qilong Wu, Yi Li, Jianghan Cheng, Junwei Li Bridging the Modality Gap: Advancing Multimodal Human Pose Estimation with Modality-Adaptive Pose Estimator and Novel Benchmark Datasets
	Xiaole Zhu, Zongtao Duan, Junchen Huang, Xing Sheng Momentum-Based Uni-Modal Soft-Label Alignment and Multi-Modal Latent Projection Networks for Optimizing Image-Text Retrieval
	Keyang Lin, Zhijun Fang, Sicong Zang, Hang Wu Learning Adaptive Basis Fonts to Fuse Content Features for Few-shot Font Generation
	Xiaoyu Guan, Yihao Li, Tianyu Huang TaiCrowd: A High-Performance Simulation Framework for Massive Crowd
	Wei Ge, Yongwei Nie, Fei Ma, Keke Tang, Fei Richard Yu, Hongmin Cai, Ping Li Training-Free Language-Guided Video Summarization via Multi-Grained Saliency Scoring
	Benchao Li, Yun Zou, Ruisheng Ran MCFG with GUMAP: A Simple and Effective Clustering Framework on Grassmann Manifold
	Yun Zou, Benchao Li, Ruisheng Ran Joint UMAP for Visualization of Time-Dependent Data
	Chengrong Yang, Qiwen Jin, Xiaoguo Zhang, Yujue Zhou Feature Disentanglement and Fusion Model for Multi-Source Domain Adaptation with Domain-Specific Features
	Hao Tong, Jiawei Liu, Yong Wu, Guozhi Zhao, Fanrui Zhang, Zheng-Jun Zha Multi-Granularity and Multi-Modal Prompt Learning for Person Re-Identification
	Lu Xu, Shuaixin Li, Xin Zhou, Xiaozhou Zhu, Wen Yao Local and Global Feature Cross-attention Multimodal Place Recognition
	Hongchao Zhong, Li Yu, Longkun Zou, Ke Chen Unsupervised Domain Adaptation on Point Cloud Classification via Imposing Structural Manifolds into Representation Space
	Kailang Hu, Yixiao Lu, Huibing Li, Xuan Song A Trademark Retrieval Method Based on Self-Supervised Learning
	Shu Liu, Melikamu Liyih Sinishaw, Luo Zheng DIMATrack: Dimension Aware Data Association for Multi-Object Tracking
	Qinghua Song, Xiaolei Wang Efficient Transformer Network for Visible and Ultraviolet Object Tracking
	Zheng Zhang, RuiQing Yang, ChuanLei Zhang IML-CMM - A Multimodal Sentiment Analysis Framework Integrating Intra-Modal Learning and Cross-Modal Mixup Enhancement
	Junjiang Liu, Dandan Sun, Hailun Xia, Jiangtao Bai, Xinyue Fan Weaken Noisy Feature: Boosting Semi-Supervised Learning by Noise Estimation
	Ruizhong Du, Luman Zhao, Mingyue Li, Yidan Li, Shenyu Li, Caixia Ma ADMMOA: Attribute-Driven Multimodal Optimization for Face Recognition Adversarial Attacks
	Mingming Li, Fei Wu, Yinjie Wang LightGR-Transformer: Light Grouped Residual Transformer for Multispectral Object Detection
	Weiye Peng, Shenghua Zhong Multi-Dimension Full Scene Integrated Visual Emotion Analysis Network
	Shan Huang, Wenhua Qian Gap-KD: Bridging the Significant Capacity Gap Between Teacher and Student Model
16:00 - 16:30	Closing Session

Keynote Speakers

Prof. Maneesh Agrawala, Stanford University, USA

Title:

Beyond Unpredictable Black Boxes: Designing Generative AI For Iterative Refinement

Abstract:

Human creation of high-quality content often requires significant iteration. People produce a coarse initial draft, and refine it step-by-step towards a final result. While modern generative AI tools are capable of producing surprisingly high-quality content from simple text prompts, they do not support this iterative refinement workflow. Instead today's AI tools are black boxes, making it impossible for users to build a mental/conceptual model that can predict how an input prompt will be transmuted into output content. The lack of predicatability forces users to rely on iterative trial-and-error, repeatedly crafting a prompt, using the AI to generate a result, and then adjusting the prompt to try again. In this talk I'll outline some features generative AI tools should provide to support iterative refinement workflows rather than iterative trial-and-error. These features include hierarchical decomposition of the creation task and consistency of the output content from iteration to iteration. Finally I'll suggest some approaches we might use to build generative AI tools that provide such features and demonstrate a few implementations of these ideas that we have developed in our lab at Stanford.

Speaker's Biography:

Maneesh Agrawala is the Forest Baskett Professor of Computer Science and Director of the Brown Institute for Media Innovation at Stanford University. He works on computer graphics, human computer interaction and visualization. His focus is on investigating how cognitive design principles can be used to improve the effectiveness of audio/visual media. The goals of this work are to discover the design principles and then instantiate them as constraints and controls in generative AI tools. Honors include an Okawa Foundation Research Grant (2006), an Alfred P. Sloan Foundation Fellowship (2007), an NSF CAREER Award (2007), a SIGGRAPH Significant New Researcher Award (2008), a MacArthur Foundation Fellowship (2009), an Allen Distinguished Investigator Award (2014) and induction into the SIGCHI Academy (2021). He was named an ACM Fellow in 2022.

Prof. Eitan Grinspun, University of Toronto, Canada

Title:

Simulate Everything, Everywhere, All At Once

Abstract:

Reduced order modelng (ROM) has long promised the ability to simulate complex dynamics at lightning speed—at the cost of specialization. Traditional ROMs are typically tied to a specific scenario, geometry, and discretization, making them brittle in the face of changing applications, shapes, or numerical methods.

If we could lift these restrictions, then we could unlock the potential of dramatically more expressive ROMS that can accelerate a wide range simulations and applications. These ROMS could absorb data from disparate simulations—even those conducted on different discretizations—and generalize across diverse shapes.

I will present first steps toward these goals. Using neural fields to build continuous, differentiable representations of physical phenomena, we can learn a low-dimensional manifold of kinematic representations not tied to one shape or discretization. These new ROMS enable fast, accurate simulations that can train on—and generalize across—grids, meshes, and point clouds alike, and their generalization across shapes has exciting connections to spectral geometry processing. I believe that these approaches point to a new kind of simulation engine: one that is fast, general, and geometry-aware, bringing us one step closer to simulating everything, everywhere, all at once.

Speaker's Biography:

Eitan Grinspun is Associate Chair, Communications, Mentoring and Inclusion at the Department of Computer Science at the University of Toronto, where he is also a Professor of Computer Science and Mathematics. He was previously the Co-Director of the Columbia Computer Graphics Group at Columbia University (2004-2021), Professeur d'Université Invité at l'Université Pierre et Marie Curie in Paris (2009), a Research Scientist at New York University's Courant Institute (2003-2004), a graduate student at the California Institute of Technology (1997-2003), and an undergraduate in Engineering Science at the University of Toronto (1993-1997). He has been an NVIDIA Fellow (2001), Eberhart Distinguished Lecturer (2003), NSF CAREER Awardee (2007), Alfred P. Sloan Research Fellow (2010-2012), one of Popular Science magazine's "Brilliant Ten Scientists" (2011), and Fast Company magazine's "Most Creative People in Business" (2013). Technologies developed by his lab are used in products such as Adobe Photoshop & Illustrator, at major film studios, and in soft matter physics and engineering research. He has been profiled in The New York Times, Scientific American, New Scientist, and mentioned in Variety. His film credits include The Hobbit, Rise of the Planet of the Apes, and Steven Spielberg's The Adventures of Tintin.

Prof. Minhyuk Sung, KAIST, South Korea

Title:

Inference-time guided generation using diffusion and flow models

Abstract:

Recent breakthroughs in generative AI have transformed the creative process, making it easier than ever to generate realistic images and videos. While the quality of generated outputs has reached unprecedented levels of realism, the challenge now lies in improving controllability and alignment with user preferences. Although text-to-image generative models have become prevalent, text input alone is often insufficient to provide precise spatial or stylistic control. Traditionally, users have provided such inputs through direct interactions, such as mouse clicks, but supporting these traditional input methods has become increasingly challenging. Moreover, despite the widespread use of text-to-image generation, text-image alignment remains far from perfect, particularly for complex prompts. To enhance controllability and alignment with user intent, recent advancements in LLMs have shifted focus beyond scaling training data and model size to scaling inference-time computation, as demonstrated by the AGI-level performance of models like GPT-4o and DeepSeek. In this talk, I will discuss inference-time generation techniques for guided image and video generation, categorized into three main approaches. First, noise manipulation leverages the observation that adjusting noise during the denoising process influences the final output, enabling alignment with user-defined spatial guidance. Second, gradient-descent-based algorithms utilize the expectation of the final output at an intermediate denoising step, which can be interpreted through the lens of score distillation and combined with it. Lastly, particle sampling exploits the stochastic nature of the generative process, branching out the generation to scale up and search for the desired output, albeit at the cost of increased inference computation. I will explore their capabilities, limitations, and future directions.

Speaker's Biography:

Minhyuk Sung is an Associate Professor in the School of Computing at KAIST, affiliated with the Graduate Schools of AI and Metaverse. Previously, he was a Research Scientist at Adobe Research. He earned his Ph.D. from Stanford University under Prof. Leonidas J. Guibas. His research focuses on generating, manipulating, and analyzing various visual data, including images, videos, and 3D data. He has served on program committees for SIGGRAPH Asia (2022-2025), Eurographics (2022, 2024-2025), Pacific Graphics (2023, 2025), ICCV (2025), ICLR (2025), and AAAI (2023, 2024). He received the Asia Graphics Young Researcher Award in 2024.

CVM 2025

Computational Visual Media Conference April 19-21, 2025 Hong Kong SAR, China

CVM 2025 Conference Program

Keynote Speakers

Beyond Unpredictable Black Boxes: Designing Generative AI For Iterative Refinement

Simulate Everything, Everywhere, All At Once

Inference-time guided generation using diffusion and flow models

Computational Visual Media Conference
April 19-21, 2025
Hong Kong SAR, China