Below is the core WebRTC connection code. It sets up an RTVIClient instance with configurable parameters for audio playback and event handling:
const rtviClient = new RTVIClient({
transport,
params: {
baseUrl: "http://localhost:7860/",
},
enableMic: true,
enableCam: false,
timeout: 30 * 1000,
});This snippet includes event handlers for:
- Setting up audio playback
- Handling user interface interactions
Rename the sample environment file and populate the required variables:
mv sample.env .envFill in the following variables:
NGROK_URL: Generated usingngrok http 7860DAILY_API_KEY: Obtain from daily.coOPEN_API_KEY: Obtain from OpenAIGEMINI_API_KEY: Obtain from Google Console- Twilio variables: Obtain from twilio.com
Navigate to the server directory, install dependencies, and start the server:
cd server
pip3 install -r requirements.txt
python server.pyInstall dependencies and start the development server:
npm i
npm run devOr it is better to use some deployment sites like vercel for automation. Else there will be CORS issue if deployed on a non SSL server.
Note: The web app cannot be run directly in the browser without a customer ID. Use the hosted version at: DialMateAI
To test the Phone AI Agent, use the following cURL command:
curl -X POST https://<ngrokurl>/make-call \
-H "Content-Type: application/json" \
-d '{"to_phone_number": "+9111111"}'Replace +9111111 with your phone number. Note: Only verified numbers from the Twilio console can be called.
For your number to be verified, contact us directly at arsh0javed@gmail.com
Therefore, use our web app to test which is DialMateAI
- This submission showcases a seamless integration of WebRTC, Twilio, and AI capabilities.
- The hosted web app allows users to experience the application without complex setup.
┌─────────────────────────────────────────┐
│ │
│ Server │
│ │
│ │
│ ┌────────────────────┐ │
│ │ │ │
│ │ Pipecat │ │
│ │ Pipeline │ │
│ │ │ │
│ │ │ │
┌──────────────────────────┐ │ │ Audio Processing │ │
│ │ │ │ ▼ │ │
│ Pipecat Client │ │ ┌─────────────│ Gemini Flash ─┼──┼────►
│ ┌───────────────┐ │ │ │ │ Transcription ◄┼──┼─────
│ │ WebRTC (Daily)│ ────┼────────►│WebRTC (Daily) ▼ │ │
│ │ Transport │ ◄───┼─────────│ Transport │ Gemini Multimodal─┼──┼────►
│ └───────────────┘ │ │ │ │ Live API ◄┼──┼─────
│ │ │ └─────────────│ ▼ │ │
└──────────────────────────┘ │ │ Gemini Flash ─┼──┼────►
│ │ Transcription ◄┼──┼─────
│ │ ▼ │ │
│ │ Conversation │ │
│ │ Context │ │
│ │ Management │ │
│ │ ▼ │ │
│ │ RTVI Events │ │
│ │ │ │
│ └────────────────────┘ │
│ │
└─────────────────────────────────────────┘
- Function (Tool) Calling
- Low latency with cloud WebRTC
- Integration with Twilio for high quality phone calls
- OpenAI and Gemini Realtime Support
- Voice Activity Detection (VAD)
- Configurable Natural Sounding Voices
- Highly intelligent sales agent
Note: Since you cannot run the phone ai agent without getting verified, here is an example demo: https://youtube.com/shorts/r8-pi9Vtf7w?si=5_790MHq0RiQ6PRP
Thank You.
