Yizhi Song

I'm a CS PhD candidate from CGVLab at Purdue University, advised by Prof. Daniel Aliaga. Before coming to Purdue, I recieved my B.S. in Computer Science from Zhejiang University. I interned at Qualcomm in summer 2021, and worked as a research intern at Adobe in summer 2022 & 2023. During my internships, I'm fortunate to work with Dr. Meng-Lin Wu, Dr. Zhifei Zhang and Dr. Zhe Lin.

I'm interested in diffusion models (SD and DiT), multi-modal LLMs (LLaVA), and customized image editing (especially on identity preservation).

Email  /  CV  /  Scholar  /  Github  /  LinkedIn

profile photo

Research

Moving Image 1

Refine-by-Align: Reference-Guided Artifacts Refinement through Semantic Alignment
Yizhi Song, Liu He, Zhifei Zhang, Soo Ye Kim, He Zhang, Wei Xiong, Zhe Lin, Brian L. Price, Scott Cohen, Jianming Zhang, Daniel Aliaga
PDF (coming soon!) / Project Page

We introduce a new task: reference-guided refinement of generative artifacts. Given a synthesized image, a reference and a free-form mask marking the artifacts, the model automatically identifies the correspondence in the reference and extracts the localized feature, which is then used to fix the artifacts.

Moving Image 1

GroundingBooth: Grounding Text-to-Image Customization
Zhexiao Xiong, Wei Xiong, Jing Shi, He Zhang, Yizhi Song, Nathan Jacobs
PDF / Project Page

We introduce GroundingBooth, a framework that achieves zero-shot instance-level spatial grounding on both foreground subjects and background objects in the text-to-image customization task.

Kubrick: Multimodal Agent Collaborations for Video Generation
Liu He, Yizhi Song, Hejun Huang, Xin Zhou
PDF / Project Page

We build the first multimodal agent-based video generation pipeline through 3D engine scripting. Given any text prompt, multimodal agents collaborate to produce detailed Blender scripts to generate video with plausible character and motion consistency in any length.

Moving Image 1

IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian L. Price, Jianming Zhang, Soo Ye Kim, He Zhang, Wei Xiong, Daniel Aliaga
CVPR 2024
PDF / Project Page

Our tuning-free model achieves advanced image composition with a decent identity preservation, automatic object viewpoint/pose adjustment, color and lighting harmonization, and shadow synthesis. All these effects are achieved in a single framework!

Moving Image 1

Thinking Outside the BBox: Unconstrained Generative Object Compositing
Gemma Canet Tarrés, Zhe Lin, Zhifei Zhang, Jianming Zhang, Yizhi Song, Dan Ruta, Andrew Gilbert, John Collomosse, Soo Ye Kim
ECCV 2024
PDF / Project Page (coming soon!)

We introduce a novel task, unconstrained image compositing, where the generation is not bounded by the input mask and can even occur without one (thus supports automatic object placement). This allows the generation of realistic object effects (shadows and reflections) that go beyond the mask while preserving the surrounding background.

Moving Image 1

ObjectStitch: Object Compositing With Diffusion Model
Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian L. Price, Jianming Zhang, Soo Ye Kim, Daniel Aliaga
CVPR 2023
Project Page / Paper / arXiv / Reposted by AK

We define a novel task: generative image compositing, and present the first diffusion model-based framework, ObjectStitch, which can handle multiple aspects of compositing such as viewpoint, geometry, lighting and shadow together in a unified model.

Moving Image 1

A Three-Stage Real-Time Detector for Traffic Signs in Large Panoramas
Yizhi Song, Ruochen Fan, Sharon Huang, Zhe Zhu, Ruofeng Tong
CVM 2019 (oral)
PDF

We propose a novel three-stage traffic sign detection framework which achieves state-ofthe-art detection accuracy in real-time.

Work Experiences

Research scientist intern at Adobe (summer 2023)

  • Project: object-centric image compositing.

Research scientist intern at Adobe (summer 2022)

  • Project: object-centric image compositing.

Interim engineering intern at Qualcomm (summer 2021)

  • Project: depth-aware image inpainting.

Academic Service

Reviewers: CVPR, ECCV, ACMMM, NeurIPS

Teaching

  • CS 334 (Fundamentals Of Computer Graphics)
  • CS 252 (Systems Programming)

Misc

I'm also an angler and a fan of musicals and original movie soundtracks.


This website is built on Jon Barron's website. Thanks for Leonid Keselman's Jekyll template.