Step by step manual to install Stable Video Diffusion
On 21 November 2023, Stability AI launched Stable Video Diffusion, their first foundation model for generative video based on the image model Stable Diffusion. It is a latent diffusion model trained to generate short video clips from an image conditioning, which means it builds and predicts a video sequence starting from a base image. This is a set of two models (SVD & SVD-XT) capable of generating 14 and 25 frames respectively at customizable frame rates between 3 and 30 fps.
The main feature of Stable Video Diffusion is its ability to generate stable and high-quality videos with temporal consistency. The model, as of now, is intended for research purposes only and can be used for tasks such as text-to-video and image-to-video generation.
B-roll footage is supplementary footage that is used to support the main footage in the production of a video. It is often used to provide context, establish the place and time, and add visual interest to a story.
The main challenge with B-roll footage is that it can be time-consuming and expensive to produce, and it can be difficult to find footage that matches the specific needs of the production studio. The 'Stock Footage Problem' poses several other challenges for media companies using B-roll, like complex licensing issues, lack of originality and customization options, and time-consuming searches through extensive libraries. The quality and relevance of the footage can vary, making it difficult to maintain consistency in style and visual quality when integrating stock footage with the primary video footage. Also, overuse of popular clips can lead to predictability, and there's often a limited range of representation and diversity in stock footage libraries. To address these issues, media companies are exploring solutions like creating their own B-roll libraries, or utilizing AI for efficient search and customization, or sourcing more diverse and unique content from community platforms.
Stable Video Diffusion addresses these challenges by providing an open-source solution for media companies. The code for the model is available on Stability AI's GitHub repository, and the weights required to run the model locally can be found on their Hugging Face page.
In this blog, we’ll go through the step-by-step process of running Stable Video Diffusion on your system.
Clone the Repo
git clone https://github.com/Stability-AI/generative-models.git
cd generative-models
Installing the Requirements
We install PyTorch 2.0 in a virtual environment.
# install required packages from pypi
python3 -m venv .pt2
source .pt2/bin/activate
pip3 install -r requirements/pt2.txt
pip install .
pip3 install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdata
Setting Up the Packaging
pip install hatch
hatch build -t wheel
pip install dist/*.whl
Dowload the Model Weights
First, create a directory by the name checkpoints.
mkdir checkpoints
Now, run the following commands to download the weights.
wget -O checkpoints/svd_xt.safetensors https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/resolve/main/svd_xt.safetensors?download=true
wget -O checkpoints/svd_xt_image_decoder.safetensors https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/resolve/main/svd_xt_image_decoder.safetensors?download=true
wget -O checkpoints/svd.safetensors https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/resolve/main/svd.safetensors?download=true
wget -O checkpoints/svd_image_decoder.safetensors https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/resolve/main/svd_image_decoder.safetensors?download=true
Launch SVD
Make sure you have copied the script to run in your main directory.
cp scripts/demo/video_sampling.py video_sampling.py
streamlit run video_sampling.py
Generating Stock Footage
You have managed to run Stable Video Diffusion successfully. Now, you can generate your own customized stock footage.
References
https://github.com/Stability-AI/generative-models