This repo contains the VPLM Dataset and pretrained checkpoints for RACCooN
See also: https://github.com/jaehong31/RACCooN
RACCooN is a versatile and user-friendly video-to-paragraph-to-video generative framework that supports multiple video editing capabilities such as removal, addition, and modification, through a unified pipeline. RACCooN consists of two principal stages: Video-to-Paragraph (V2P) and Paragraph-to-Video (P2V).
RACCooN suggests a multi-granular spatiotemporal pooling strategy to generate well-structured video descriptions, capturing both the broad context and object details without requiring complex human annotations, simplifying precise video content editing based on text for users. Our video generative model incorporates auto-generated narratives or instructions to enhance the quality and accuracy of the generated content. It supports the addition of video objects, inpainting, and attribute modification within a unified framework, surpassing existing video editing and inpainting benchmarks.
Description of VPLM Dataset
Multi-Objects Description
- Train: RACCooN/VPLM/gt_train.json
- Test: RACCooN/VPLM/gt_test.json
Single-Object Layout Prediction
- Train: RACCooN/VPLM/gt_train_layouts.json
- Test: RACCooN/VPLM/gt_test_layouts.json
Description of Model Checkpoints
V2P
Multi-Objects Description
- RACCooN/mllm_finetuned/multi_obj_projector.bin
Single-Object Description
- RACCooN/mllm_finetuned/single_obj_projector.bin
Single-Object Layout Prediction
- RACCooN/mllm_finetuned/layout_pred_projector.bin
P2V
- RACCooN/unet_finetuned/diffusion_pytorch_model.safetensors
- Downloads last month
- 0