Automatic Speech Recognition (ASR)
We support the following third-party service providers for ASR services:| ASR | On-Prem / Cloud | Languages | Regions | Word Error Rate (WER) | Comments |
|---|---|---|---|---|---|
| Cloud | Supported Languages | - Locations v2 - Docs - Regions | 4-9% | - Good for short utterances (for example, “yes”, “no”). - Works well for numeric and alphanumeric inputs (IDs, SSN). - Supports class tokens for output formatting. - Supports hints and hint-boosts. - Extensive language support. | |
| Deepgram | Cloud & On-Prem | Supported Languages | Supports all regions globally | 3.44% | - Supports hints. - Provides custom models via Deepgram team. - Supports smart formatting (numbers, dates). |
| Azure | Cloud & On-Prem | Supported Languages | Regions | 5-10% | - Preferred ASR provider; default for new accounts. - Low WER with flexible customization. - Supports hints. - Extensive language support. - Supports custom model creation via Azure portal. |
| Nvidia Riva (Nvidia) | On-Prem | ASR Overview | - | - 67% | |
| Amivoice ASR (Advanced Media Inc) | Cloud | Supported Languages | Primarily Japan-based processing and storage | N/A | |
| Amazon Transcribe | Cloud | Supported Languages | Regions | - 60% | |
| gnani.ai | Cloud & On-Prem | Supported Languages | Deployable in customer-specified regions (private cloud or on-premises) | 2% |
Text to Speech (TTS)
We support the following third-party service providers for TTS services:| TTS | On-Prem / Cloud | Languages | Regions | Comments |
|---|---|---|---|---|
| Cloud | Supported Voices | Operates within Google Cloud’s global infrastructure | ||
| Azure | Cloud & On-Prem | Supported Languages | Regions | - Extensive language support. - Large number of voices. - Supports custom voice creation through the portal. - Supports SSML (limited to Azure-supported tags). |
| OpenAI TTS | Cloud | Supported Languages | - | - Human-like voices. - Limited number of voices. |
| Eleven Labs | Cloud | Docs | - | - Human-like voices. - Supports speed, temperature, and stability controls. - Supports voice cloning with 30-60 second samples. |
| AWS | Cloud | Supported Languages | Regions | |
| gnani.ai | Cloud & On-Prem | API Service | - | |
| Deepgram | Cloud & On-Prem | Supported Languages | - | - Limited number of languages. - Human-like voices. |
| Nvidia Riva TTS | On-Prem | - | - | - |
Voice Biometrics
We support the following third-party service providers for voice biometrics:| Voice Biometric Vendor | Voice Biometric Engine | On-Prem / Cloud | Comments |
|---|---|---|---|
| ID R&D | ID Voice | - | - |