Clone any voice and let your bot speak with it.
SoulSaying is an OpenClaw skill that adds voice messaging to your bot using SiliconFlow's TTS engine with voice cloning support. Works with Feishu, Telegram, Discord, and WhatsApp.
- 🎤 Voice Cloning — Upload a 10-30s audio sample, get a cloned voice
- 🗣️ Text-to-Speech — Convert any text to natural speech
- 💬 Multi-Platform — Feishu, Telegram, Discord, WhatsApp
- 🔀 Mode Switching — Users can toggle between text and voice modes
- 🆓 Free Tier Available — SiliconFlow offers free API credits
- 🇨🇳 China-friendly — No VPN needed, all APIs are domestic
User message → Bot generates text → SiliconFlow TTS (cloned voice) → mp3
→ ffmpeg → platform format → Upload → Send as voice message
Supported: Feishu · Telegram · Discord · WhatsApp · Local playback
Copy the skill-soulsaying folder into your OpenClaw workspace:
cp -r skill-soulsaying /path/to/your/openclaw-workspace/skills/cp skills/skill-soulsaying/config.env.example skills/skill-soulsaying/config.env
# Edit config.env with your API keysExtract audio from any video using abcdtools:
- Duration: 10-30 seconds
- Quality: Clear speech, single speaker, no background music
- Format: mp3 or wav
bash skills/skill-soulsaying/scripts/clone_voice.sh sample.mp3 my-voice
# Copy the returned voice URI into config.env# Generate speech
bash skills/skill-soulsaying/scripts/tts.sh "Hello world" /tmp/test.mp3
# Send to Feishu
bash skills/skill-soulsaying/scripts/speak.sh "你好,语音模式已开启"
# Or Telegram / Discord / WhatsApp
bash skills/skill-soulsaying/scripts/speak.sh "Hello" telegram
bash skills/skill-soulsaying/scripts/speak.sh "Hello" discord
bash skills/skill-soulsaying/scripts/speak.sh "Hello" whatsapp| Requirement | How to Get |
|---|---|
| SiliconFlow API Key | Free at siliconflow.cn |
| ffmpeg | brew install ffmpeg (macOS) / apt install ffmpeg (Linux) |
| A bot (at least one) | Feishu · Telegram · Discord · WhatsApp |
| OpenClaw | github.com/openclaw/openclaw |
skill-soulsaying/
├── SKILL.md # Skill definition (OpenClaw reads this)
├── config.env.example # Configuration template
├── scripts/
│ ├── clone_voice.sh # Upload sample → get voice URI
│ ├── tts.sh # Text → speech audio
│ ├── send_feishu_voice.sh # Audio → Feishu voice message
│ ├── send_telegram_voice.sh # Audio → Telegram voice message
│ ├── send_discord_voice.sh # Audio → Discord audio attachment
│ ├── send_whatsapp_voice.sh # Audio → WhatsApp voice message
│ ├── speak.sh # One-step: text → platform voice
│ ├── list_voices.sh # List your cloned voices
│ └── delete_voice.sh # Remove a cloned voice
└── references/
└── api-notes.md # SiliconFlow & Feishu API reference
Add voice mode instructions to your bot's SOUL.md:
- User says "语音模式" / "voice on" → bot replies with text + voice
- User says "文字模式" / "voice off" → bot replies with text only
See SKILL.md for the exact instructions to paste.
Don't have a voice sample? Use SiliconFlow's preset voices:
| Voice | Style |
|---|---|
| bella | Warm female |
| claire | Clear female |
| anna | Sweet female |
| alex | Neutral |
Set VOICE_URI="FunAudioLLM/CosyVoice2-0.5B:bella" in config.env.
PRs welcome! Ideas for improvement:
- Support for more TTS providers (Edge TTS, Bark, etc.)
- Support for more messaging platforms (Signal, Slack, Line, etc.)
- Streaming voice for long text
- Voice effect presets (speed, pitch)
MIT
克隆任何声音,让飞书机器人用它说话。
SoulSaying 是一个 OpenClaw 技能,通过硅基流动的TTS引擎和声音克隆功能,让你的飞书机器人拥有语音回复能力。
- 🎤 声音克隆 — 上传10-30秒音频,克隆任何声音
- 🗣️ 文字转语音 — 自然流畅的中文语音
- 💬 飞书/Telegram/Discord/WhatsApp 多平台集成
- 🔀 模式切换 — 用户可在文字和语音模式间切换
- 🆓 免费可用 — 硅基流动提供免费额度
- 🇨🇳 国内服务 — 无需VPN,全部国内API
使用 abcdtools 从视频中提取音频:
- 10-30秒清晰人声
- 无背景音乐
- 单人说话
详见上方 Quick Start 部分。
Made with 🦐 by an interdimensional lobster