Audio-to-3D Face SDK for Unity enables developers to generate real-time 3D facial animations from speech audio using our AI-powered API. Simply feed in an audio clip or audio stream, and the SDK returns a sequence of facial blendshape coefficients that can be applied to any 3D character model — bringing your avatars to life with natural, speech-synchronized expressions.
⚠️ This repository is tailored as a sample. Replace API endpoints, keys and assets with your own when integrating into a real application.
audiotoface_demo.mp4
- Microphone recording with automatic WAV conversion (
AudioRecord) - Login and HTTP request handling via
APIManagerandAPISettingsConfig - Streaming of audio and emote frames into your scene using
AudioToFaceManager - Sample MonoBehaviour (
AudioToFaceSample) with UI buttons for recording, sending audio, and selecting local WAV files
-
Open the project in Unity (2020.3+ recommended).
-
Configure the APISettingsConfig asset (a default configuration is provided for demonstration; you can either modify it directly or create your own):
- Get your
Access KeyandSecret Keyfrom theDeveloperspage athttps://www.sumeruai.us/. - Option A: Modify the default configuration at
Assets/SumeruAI/Resources– fill in theAccess Key,Secret Key, and the base URL plus relative paths forloginandatfMesh. - Option B: Create a new configuration – in the Project window,
Right-click → Create → SumeruAI → API Settings Config, then fill in the required API credentials and paths from theDeveloperspage. - The asset must reside under a
Resourcesfolder so it can be loaded at runtime.
- Get your
-
Scene preparation:
- Open the sample scene:
Assets/SumeruAI/Samples/Scenes/AudioToFace.unity - The scene already includes a GameObject with
AudioToFaceSampleattached.
- Open the sample scene:
-
Run the sample:
- Press Play in the Editor.
- Use the UI buttons: "Start Record" to begin recording, "Stop Record" to stop and send the audio; "Select Local Audio" (Editor only) to pick a WAV file and send it.
- After the server responds, the manager will enqueue audio and blendshape data. The face should animate and the audio play back.
-
Build targets: The code currently supports Editor and Windows (
Application.platformcheck for writing files). Add platform-specific paths if needed.
Data classes defined in RequestUtil.cs are serialized as JSON for communication:
[Serializable]
public class ATFReqData
{
public string status; // "start" or "stop" etc.
public string dialogueBase64; // base64-encoded WAV bytes
public string lastDialogueBase64; // optional
public string traceId;
}
[Serializable]
public class ATFRepData : BaseResponse
{
public ATFRepBodyData data;
}
[Serializable]
public class ATFRepBodyData
{
public long id;
public string emoteKey; // base64-encoded float array for blendshapes
public string audioKey; // base64-encoded WAV bytes
public float fps;
}The sample sends requests via APIManager.Request<TReq,TRep>, which automatically attaches the access token obtained from Login().
- Multiple characters: call
AudioToFaceManager.GetInstance().RegisterModelwith distinct IDs. - Custom recorders: replace
AudioRecordwith your own microphone or file logic – just callAudioToFaceManager.AddAudioFaceDatawith the returned base64 strings. - UI hooks: subscribe to
AudioToFaceManagerevents (StartSpeechEvent,StopSpeechEvent,StopMotionEvent, etc.) for in-game notifications.
- All networking is asynchronous; callbacks are executed on the Unity main thread.
- The manager converts the server's byte arrays into float arrays and then into
EmoteDataobjects for playback. - The sample includes minimal error logging; expand it when integrating into production.
- Editor-only code uses
UnityEditorAPIs guarded by#if UNITY_EDITOR.
Feel free to fork and adapt this sample for your own AudioToFace workflows. Contributions and issues are welcome!