https://github.com/jixiaozhong/Sonic;
2.1 python>=3.10
2.2 Requires cude environment,cuda>=11.8
2.3 Installation package environment
pip install -r requirements1.txt2.4 need to build ops:
cd src/utilslive/dependencies/XPose/models/UniPose/ops
python setup.py build installpython sonic_full_inference_v2.py-
download Model from https://github.com/jixiaozhong/Sonic
-
download yolov8x-seg.pt model
main_package_name
├──checkpoints
│ ├──Sonic
│ │ ├──audio2bucket.pth
│ │ ├──audio2token.pth
│ │ ├──unet.pth
│ ├──stable-video-diffusion-img2vid-xt
│ │ ├──...
│ ├──whisper-tiny
│ │ ├──...
│ ├──RIFE
│ │ ├──flownet.pkl
│ ├──yoloface_v5m.pt
├──pretrained_weights
│ ├──yolov8x-seg.pt
| Parameter | Type | Description |
|---|---|---|
| image_path | string | Input image path |
| audio_path | string | Input audio path |
| output_path | string | Output Directory Path |
| crop_save_path | string | Crop image save path |
| min_resolution | int | Minimum resolution (default 448) |
| inference_steps | int | inference steps (default 15) |
| animal_signal | bool | Is it animal mode |
| pastback | bool | Do you want to execute pasting back to the original image |
| mult_people | bool | Is it a multiplayer mode |
| dynamic_scale | float | app mode, facial dynamic amplitude parameter |
| face_boxes | List | Specify a list of face boxes, where each element is a BoundingBox structure |
| crop_size_ration | string | ve Mode, crop image ratio, such as "448:448" |
| custom_box | BoundingBox | vikapp Crop Mode, custom Crop Box, using the BoundingBox structure |
| crop_app | bool | vikapp Crop Mode |
| no_human_face_run | bool | Want to continue running without face detection |
| full_image_inference | bool | Process the full image |
| Parameter | Type | Description |
|---|---|---|
| output_video_path | string | Output video path or 'false' |
- Reduced memory usage, now saving a result video every 500 frames, with adjustable parameters for each_process_video_frames_number in Sonic_full.
- The processing options for portraits or animals have added modules such as face_boxes, crop_size_ration, no_human_face_run, custom_box and crop_app. The 3.3 Request parameters table show each parameter introduction.
- Add image format detection and conversion.
- Implement full image inference.
4.1 device infomation
| cpu | gpu |
|---|---|
| intel i7-13700KF | nvidia 4090 |
4.2 running infomation
| model | resolution | steps | audio duration | process time | gpu memory |
|---|---|---|---|---|---|
| 448 |
15 | 10s | 145s | 17G | |
| 1920 |
15 | 15s | 165s | 17G |