Consistent Video-to-Video Transfer Using Synthetic Dataset
Jiaxin Cheng, Tianjun Xiao, Tong He
Amazon Web Services Shanghai AI Lab

Teaser

Abstract

We introduce a novel and efficient approach for text-based video-to-video editing that eliminates the need for resource-intensive per-video-per-model finetuning. At the core of our approach is a synthetic paired video dataset tailored for video-to-video transfer tasks. Inspired by Instruct Pix2Pix’s image transfer via editing instruction, we adapt this paradigm to the video domain. Extending the Prompt-to-Prompt to videos, we efficiently generate paired samples, each with an input video and its edited counterpart. Alongside this, we introduce the Long Video Sampling Correction during sampling, ensuring consistent long videos across batches. Our method surpasses current methods like Tune-A-Video, heralding substantial progress in text-based video-to-video editing and suggesting exciting avenues for further exploration and deployment.

Updates
  • 2023/11/29: We have updated paper with more comparison to recent baseline methods and updated the comparison video.
Editing Results

Our method requires only an input video and an editing prompt to modify the video. There is no need to fine-tune the model on each video. Left: Input videos. Right: Edited videos.

airbrush-painting_object.gif
airplane-and-contrail_background.gif
audi-snow-trail_background.gif
cat-in-the-sun_background.gif
cat-in-the-sun_object.gif
dirt-road-driving_style.gif
drift-turn_style.gif
earth-full-view_background.gif
eiffel-flyover_background.gif
ferris-wheel-timelapse_background.gif
gold-fish_style.gif
ice-hockey_object.gif
miami-surf_background.gif
raindrops_style.gif
red-roses-sunny-day_background.gif
red-roses-sunny-day_style.gif
swans_background.gif
swans_object.gif
Synthetic Paired Video Dataset
The pipeline for generating a synthetic dataset using a large language model, whose outputs include the prompt triplet consisting of input, edit, and edited prompts, as well as a corresponding pair of videos.

Links to download data_pipe

Examples of Generated Synthetic Videos
synthetic_video_106_0.gif
synthetic_video_116_0.gif
synthetic_video_141_0.gif
synthetic_video_18_0.gif
synthetic_video_192_0.gif
synthetic_video_197_0.gif
synthetic_video_1_0.gif
synthetic_video_24_0.gif
synthetic_video_81_0.gif
synthetic_video_92_0.gif
Compare To Other Editing Methods
Links to the baselines used in the video:
Tune-A-Video Control Video Vid2Vid Zero Video P2P
TokenFlow Render A Video Pix2Video  
Bibex
@article{insv2v,
  title={Consistent Video-to-Video Transfer Using Synthetic Dataset},
  author={Jiaxin Cheng, Tianjun Xiao, Tong He},
  year={2023},
}