Academy
Updated on
Nov 25, 2024

How Stable Diffusion Video Model Can Help Generate Stock Footage for Media Companies

Step by step manual to install Stable Video Diffusion

How Stable Diffusion Video Model Can Help Generate Stock Footage for Media Companies
Ready to build AI-powered products or integrate seamless AI workflows into your enterprise or SaaS platform? Schedule a free consultation with our experts today.

Stable Video Diffusion: An Overview

On 21 November 2023, Stability AI launched Stable Video Diffusion, their first foundation model for generative video based on the image model Stable Diffusion. It is a latent diffusion model trained to generate short video clips from an image conditioning, which means it builds and predicts a video sequence starting from a base image. This is a set of two models (SVD & SVD-XT) capable of generating 14 and 25 frames respectively at customizable frame rates between 3 and 30 fps. 

The main feature of Stable Video Diffusion is its ability to generate stable and high-quality videos with temporal consistency. The model, as of now, is intended for research purposes only and can be used for tasks such as text-to-video and image-to-video generation.

The Stock Footage Problem: Why B-Roll Is a Challenge for Media Companies

B-roll footage is supplementary footage that is used to support the main footage in the production of a video. It is often used to provide context, establish the place and time, and add visual interest to a story. 

The main challenge with B-roll footage is that it can be time-consuming and expensive to produce, and it can be difficult to find footage that matches the specific needs of the production studio. The 'Stock Footage Problem' poses several other challenges for media companies using B-roll, like complex licensing issues, lack of originality and customization options, and time-consuming searches through extensive libraries. The quality and relevance of the footage can vary, making it difficult to maintain consistency in style and visual quality when integrating stock footage with the primary video footage. Also, overuse of popular clips can lead to predictability, and there's often a limited range of representation and diversity in stock footage libraries. To address these issues, media companies are exploring solutions like creating their own B-roll libraries, or utilizing AI for efficient search and customization, or sourcing more diverse and unique content from community platforms.

Stable Video Diffusion addresses these challenges by providing an open-source solution for media companies. The code for the model is available on Stability AI's GitHub repository, and the weights required to run the model locally can be found on their Hugging Face page.

In this blog, we’ll go through the step-by-step process of running Stable Video Diffusion on your system.

Installation

Clone the Repo

git clone https://github.com/Stability-AI/generative-models.git
cd generative-models

Installing the Requirements

We install PyTorch 2.0 in a virtual environment.

# install required packages from pypi
python3 -m venv .pt2
source .pt2/bin/activate
pip3 install -r requirements/pt2.txt

Install sgm

pip install .

Install sdata for Training

pip3 install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdata

Setting Up the Packaging

pip install hatch
hatch build -t wheel
pip install dist/*.whl

Dowload the Model Weights

First, create a directory by the name checkpoints.

mkdir checkpoints

Now, run the following commands to download the weights.

wget -O checkpoints/svd_xt.safetensors https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/resolve/main/svd_xt.safetensors?download=true

wget -O checkpoints/svd_xt_image_decoder.safetensors https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/resolve/main/svd_xt_image_decoder.safetensors?download=true

wget -O checkpoints/svd.safetensors https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/resolve/main/svd.safetensors?download=true

wget -O checkpoints/svd_image_decoder.safetensors https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/resolve/main/svd_image_decoder.safetensors?download=true

Launch SVD

Make sure you have copied the script to run in your main directory.

cp scripts/demo/video_sampling.py video_sampling.py

streamlit run video_sampling.py

Generating Stock Footage

You have managed to run Stable Video Diffusion successfully. Now, you can generate your own customized stock footage.

References

https://github.com/Stability-AI/generative-models

Authors