Best Practices for the Rev AI APIs
By Chance Shiflett, Solutions Engineering Manager - Feb 11, 2022
Introduction
This tutorial recommends best practices to follow when working with the various Rev AI APIs.
Assumptions
This tutorial assumes that you have a Rev AI account and access token. If not, sign up for a free account and generate an access token.
Best practices
Use a supported file format
Rev AI supports most common media formats. For best results, use a lossless format such as FLAC or ALAC, or a lossy format like MP3 or AAC with a bitrate of 192 Kbps or above.
Use a high sample rate
If you control the audio source, record at 16kHz sample rate or higher. We can transcribe audio as low as 8kHz as well. Submit the audio in its original format.
Use multiple channels for multiple speakers
For perfect speaker separation (speaker diarization), record each speaker on their own channel and submit the job using the speaker_channels_count
parameter. If the speakers are recorded on a single channel, do not attempt to modify the recording; submit the file as is.
Don't pre-process audio
Don't pre-process, up-sample or down-sample audio. This can distort the audio and reduce the transcript accuracy. The Rev AI speech engine is very robust and has been designed to handle a large variety of audio recordings.
Create custom vocabulary for unusual words
To improve recognition of uncommon words, such as proper names and special technical terms, submit a list of these words as custom vocabulary along with your request. Read the Custom Vocabulary API documentation for more details.
Next steps
Learn more about the topics discussed in this tutorial by visiting the following links: