Skip to content

HitPaw-Official/TalkingPhoto

Repository files navigation

sonic_Combine

Data:2025-06-16

(1)算法来源说明

https://github.com/jixiaozhong/Sonic;

(2)环境安装说明

2.1 python>=3.10

2.2 Requires cude environment,cuda>=11.8

2.3 Installation package environment

pip install -r requirements1.txt

2.4 need to build ops:

cd src/utilslive/dependencies/XPose/models/UniPose/ops
python setup.py build install

(3)项目使用说明

3.1 Inference

python sonic_full_inference_v2.py

3.2 Model download

main_package_name
  ├──checkpoints
  │  ├──Sonic
  │  │  ├──audio2bucket.pth
  │  │  ├──audio2token.pth
  │  │  ├──unet.pth
  │  ├──stable-video-diffusion-img2vid-xt
  │  │  ├──...
  │  ├──whisper-tiny
  │  │  ├──...
  │  ├──RIFE
  │  │  ├──flownet.pkl
  │  ├──yoloface_v5m.pt
  ├──pretrained_weights
  │  ├──yolov8x-seg.pt

3.3 Request parameters

Parameter Type Description
image_path string Input image path
audio_path string Input audio path
output_path string Output Directory Path
crop_save_path string Crop image save path
min_resolution int Minimum resolution (default 448)
inference_steps int inference steps (default 15)
animal_signal bool Is it animal mode
pastback bool Do you want to execute pasting back to the original image
mult_people bool Is it a multiplayer mode
dynamic_scale float app mode, facial dynamic amplitude parameter
face_boxes List Specify a list of face boxes, where each element is a BoundingBox structure
crop_size_ration string ve Mode, crop image ratio, such as "448:448"
custom_box BoundingBox vikapp Crop Mode, custom Crop Box, using the BoundingBox structure
crop_app bool vikapp Crop Mode
no_human_face_run bool Want to continue running without face detection
full_image_inference bool Process the full image

3.4 Response parameters

Parameter Type Description
output_video_path string Output video path or 'false'

3.5 What New

  • Reduced memory usage, now saving a result video every 500 frames, with adjustable parameters for each_process_video_frames_number in Sonic_full.
  • The processing options for portraits or animals have added modules such as face_boxes, crop_size_ration, no_human_face_run, custom_box and crop_app. The 3.3 Request parameters table show each parameter introduction.
  • Add image format detection and conversion.
  • Implement full image inference.

(4)算法信息说明

4.1 device infomation

cpu gpu
intel i7-13700KF nvidia 4090

4.2 running infomation

model resolution steps audio duration process time gpu memory
448 $$\times$$ 512 15 10s 145s 17G
1920 $$\times$$ 1080 15 15s 165s 17G

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors