Best Practices
This section is designed to provide guidelines for achieving the best results possible when using the API.
Uniqueness
Use custom vocabulary only for truly unique or rare terms that you believe are not in our massive dictionary of 500,000+ words. Examples of good custom vocabulary terms include made-up words or words with unique spelling such as sparkletini
, timi
, Rob Szypko
or Ginnifer
. Examples of bad custom vocabulary terms that you should not include are Maybelline
, e pluribus unum
, or orthostatic hypotension
.
Significance
We recommend submitting a short list of target terms (no more than 500 phrases) as large lists may negatively impact performance and accuracy. Focus on the terms you know will persist and you want the model to get correct; additional terms that don't appear in the audio may actually cause Word Error Rate (WER) performance degradation. Remember that you can always update your custom vocabulary list to include additional words that you recognize as important, but may have been missed by the ASR model.
Length
Short phrases do better than long phrases, so keep your phrases on the short side if possible. Avoid using sentences or long phrases, because phrases will be boosted as unigram terms, but performance degrades past 5 words per phrase. For example, instead of the phrase getting started with food psych and body positivity
, use the shorter phrase food psych
.
Capitalization and casing
Be conscious of capitalization and casing. Rev AI wants to represent your vocabulary terms in the most accurate way. Don't use custom vocabulary just for capitalization or stylistic elements of a term; this feature is better served by focusing on the spelling of the specialized terms.
Languages
Don't mix languages in your custom vocabulary. The Rev AI ASR model is not multilingual. We strongly recommend focusing on the terms in the main source language of your audio.
Pre-compilation
Compile your custom vocabulary list ahead of time. This helps with asynchronous transcription performance (vs. sending as a list) and is the only option for streaming transcription.
Review
Re-examine your audio and custom vocabulary list periodically. If you start to see the WER increase after you supply a custom vocabulary, narrow down the terms to focus on the ones of highest importance.
Support
To report errors or request assistance, contact the support team by email at support@rev.ai. Always keep logs of failed jobs, including media files and unique job identifiers, as these will help the support team to investigate and resolve your issue.