Skip to content

ArshCypherZ/dialmate-backend

Repository files navigation

Hackathon Submission: WebRTC Connection with Phone AI Agent

WebRTC Client Code

Below is the core WebRTC connection code. It sets up an RTVIClient instance with configurable parameters for audio playback and event handling:

const rtviClient = new RTVIClient({
  transport,
  params: {
    baseUrl: "http://localhost:7860/", 
  },
  enableMic: true,
  enableCam: false,
  timeout: 30 * 1000,
});

This snippet includes event handlers for:

  • Setting up audio playback
  • Handling user interface interactions

Environment Setup

Step 1: Configure Environment Variables

Rename the sample environment file and populate the required variables:

mv sample.env .env

Fill in the following variables:

  • NGROK_URL: Generated using ngrok http 7860
  • DAILY_API_KEY: Obtain from daily.co
  • OPEN_API_KEY: Obtain from OpenAI
  • GEMINI_API_KEY: Obtain from Google Console
  • Twilio variables: Obtain from twilio.com

Step 2: Run Gemini Backend Server

Navigate to the server directory, install dependencies, and start the server:

cd server
pip3 install -r requirements.txt
python server.py

Web Application Setup

Step 1: Install and Run the Web App

Install dependencies and start the development server:

npm i
npm run dev

Or it is better to use some deployment sites like vercel for automation. Else there will be CORS issue if deployed on a non SSL server.

Note: The web app cannot be run directly in the browser without a customer ID. Use the hosted version at: DialMateAI


Phone AI Agent

Step 1: Make a Test Call

To test the Phone AI Agent, use the following cURL command:

curl -X POST https://<ngrokurl>/make-call \
-H "Content-Type: application/json" \
-d '{"to_phone_number": "+9111111"}'

Replace +9111111 with your phone number. Note: Only verified numbers from the Twilio console can be called.

Step 2: Test the phone ai agent

For your number to be verified, contact us directly at arsh0javed@gmail.com

Therefore, use our web app to test which is DialMateAI

Additional Notes

  • This submission showcases a seamless integration of WebRTC, Twilio, and AI capabilities.
  • The hosted web app allows users to experience the application without complex setup.

Architecture for Gemini + WebRTC


                                 ┌─────────────────────────────────────────┐     
                                 │                                         │     
                                 │ Server                                  │     
                                 │                                         │     
                                 │                                         │     
                                 │                 ┌────────────────────┐  │     
                                 │                 │                    │  │     
                                 │                 │  Pipecat           │  │     
                                 │                 │  Pipeline          │  │     
                                 │                 │                    │  │     
                                 │                 │                    │  │     
┌──────────────────────────┐     │                 │  Audio Processing  │  │     
│                          │     │                 │         ▼          │  │     
│      Pipecat Client      │     │   ┌─────────────│   Gemini Flash    ─┼──┼────►
│    ┌───────────────┐     │     │   │             │   Transcription   ◄┼──┼─────
│    │ WebRTC (Daily)│ ────┼────────►│WebRTC (Daily)         ▼          │  │     
│    │   Transport   │ ◄───┼─────────│  Transport  │  Gemini Multimodal─┼──┼────►
│    └───────────────┘     │     │   │             │     Live API      ◄┼──┼─────
│                          │     │   └─────────────│         ▼          │  │     
└──────────────────────────┘     │                 │   Gemini Flash    ─┼──┼────►
                                 │                 │   Transcription   ◄┼──┼─────
                                 │                 │         ▼          │  │     
                                 │                 │   Conversation     │  │     
                                 │                 │     Context        │  │     
                                 │                 │    Management      │  │     
                                 │                 │         ▼          │  │     
                                 │                 │   RTVI Events      │  │     
                                 │                 │                    │  │     
                                 │                 └────────────────────┘  │     
                                 │                                         │     
                                 └─────────────────────────────────────────┘  

Architecture for OpenAI + Twilio

Flowchart

Features

  • Function (Tool) Calling
  • Low latency with cloud WebRTC
  • Integration with Twilio for high quality phone calls
  • OpenAI and Gemini Realtime Support
  • Voice Activity Detection (VAD)
  • Configurable Natural Sounding Voices
  • Highly intelligent sales agent

Note: Since you cannot run the phone ai agent without getting verified, here is an example demo: https://youtube.com/shorts/r8-pi9Vtf7w?si=5_790MHq0RiQ6PRP

Thank You.

About

Backend for calling AI sales agent.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors