This section is designed to provide guidelines for achieving the best speech-to-text results possible when using the API.
Rev AI uses FFmpeg and therefore supports all the file formats supported by FFmpeg. This includes all common media formats, such as MP3, MP4, Ogg, WAV, PCM and FLAC and many more. For best results, use a lossless format such as FLAC or ALAC, or a lossy format like MP3 or AAC with a bitrate of 192 Kbps or above.
If you control the audio source, record at 16kHz sample rate or higher. We can transcribe audio as low as 8kHz as well. Do not up-sample or down-sample audio. Submit the audio in its original format.
For perfect speaker separation (speaker diarization), record each speaker on their own channel and submit the job using the
speaker_channels_count parameter. If the speakers are recorded on a single channel, do not attempt to modify the recording; submit the file as is.
Speaker channels incur extra costs as outlined in the Asynchronous Speech-to-Text API documentation.
Don't pre-process audio. This can distort the audio and reduce the transcript accuracy. Our speech engine is very robust and has been designed to handle a large variety of audio recordings.
To improve recognition of uncommon words, such as proper names and special technical terms, submit a list of these words as custom vocabulary along with your request. Read the Custom Vocabulary API documentation for more details.
- Record in a quiet setting.
- Speak clearly, loudly, and slowly.
- Avoid talking over other people
- Use quality recording equipment, such as an external or dedicated microphone or recorder
To report errors or request assistance, contact the support team by email at firstname.lastname@example.org. Always keep logs of failed jobs, including media files and unique job identifiers, as these will help the support team to investigate and resolve your issue.