Rev AI supports 30+ languages in the Asynchronous Speech-to-Text API and 9+ languages in the Streaming Speech-to-Text API. New languages are frequently added. Please refer to the current list of supported languages.
This depends on the exact pause length but usually, a long pause will cause the transcript to start a new paragraph when speech resumes. Pauses are indicated by the timestamps on the words around them. There will be a jump in timestamps on the words around the pause.
The following default limits apply per user, per endpoint for the Asynchronous Speech-to-Text API:
- 10,000 transcription requests submitted every 10 minutes.
- 500 transcriptions processed every 10 minutes. Any submissions over this will be accepted but put into a queue and not started until the next interval.
- Maximum audio duration of 17 hours.
- File uploads submitted as
multipart/form-datarequests to the
/jobsendpoint have a concurrency limit of 5 and a file size limit of 2 GB per request.
- File uploads via the Rev AI dashboard or using the
source_configjob parameter have a file size limit of 5 TB.
These limits are adjustable by Rev AI support.
Rev AI uses FFmpeg and therefore supports all the file formats supported by FFmpeg. This includes all common media formats, such as MP3, MP4, Ogg, WAV, PCM and FLAC and many more.
multipart/form-datarequest, there is a file size limit of 2 GB per request. If uploading a local file via the Rev AI dashboard or submitting via the
source_configjob parameter, there is a file size limit of 5 TB. Learn more about file submission methods.
The maximum allowed audio length is 17 hours. For audio longer than 17 hours, it is necessary to split the audio file into chunks smaller than 17 hours and have them individually transcribed.
The following limits are in place for the Streaming Speech-to-Text API:
- Streaming concurrency limit is 10.
- Time limit per stream is 3 hours.
Job queuing is not supported. If a job is submitted while a user is at the concurrency limit, the API will return a 4029 error.
The streaming concurrency limit is adjustable by Rev AI support.
"connected"type message is received, switch to the new WebSocket and begin streaming audio to it. Learn more in our tutorial on recovering from connection errors and timeouts in Rev AI streaming transcription sessions
Yes. Refer to the Streaming Speech-to-Text API documentation for RTMP streams.
There are no daily or weekly limits.
Yes, there are maximum job processing timeouts which differ for each type of job. There is also a maximum transcription time limit per job.
Jobs remain accessible on the server for 30 days after completion unless the account is configured for a shorter auto-deletion period.
Yes. Rev AI accepts most audio and video formats. Each track can be transcribed separately.
,), periods (
.) and question marks(
Rev AI supports 8 speakers for English-language transcription and 6 for non English transcription.
There is currently no way to specify or limit the number of speakers. If each speaker is recorded on a separate channel, then the user can specify the total number of unique speaker channels in the audio via the
speaker_channel_count option in the Asynchronous Speech-to-Text API.
Speaker diarization is the process of detecting speaker switches in audio and assigning transcript segments to individual speakers with generic speaker labels such as "Speaker 1" and "Speaker 2".
Speaker identification is the process of identifying individual voices and assigning identities to each in the transcript.
Rev AI supports speaker diarization but does not support speaker identification. Although specific speakers are not identified, Rev AI is able to detect speaker switches and represent them in the transcript with numbered speaker labels. For example, if Anna speaks and then Sarah speaks, the API detects the speaker switch and labels the voices as "Speaker 1" and "Speaker 2".
No. If speakers are not being correctly assigned, note that speaker assignment can be affected by short utterances or variations in audio and other variables.
Rev AI is not a dictation service. This means that Rev AI will not transcribe words such as "comma" and "period" to their respective symbols. If you are looking for a reader-friendly, well-formatted transcript where you can view/toggle the audio along with the transcript, we recommend using our automated transcription service.
No. There is currently no API endpoint to update a custom vocabulary list. However, you can create an infinite number of custom vocabularies without needing to delete or update existing ones.
Up to 6000 phrases may be submitted per transcription job for English, and up to 1000 for other languages. We recommend submitting a short list of target terms (no more than 500 phrases) as large lists may negatively impact performance and accuracy. Short phrases also do better than long phrases, so keep your phrases on the short side if possible. For more information, refer to the Custom Vocabulary API limits and general rules.
You can usually expect your transcript to be available within 15 minutes of submitting your media file to our Asynchronous Speech-to-Text API. Most often, it will be available in less than 15 minutes, especially if your media is short duration.
skip_postprocessingparameter to skip some steps (inverse text normalization or ITN, casing and punctuation) of a transcription job. This parameter is useful to reduce the time taken for a transcription job, or to provide greater control over transcription output. The
skip_postprocessingparameter is available for both the Asynchronous Speech-to-Text API and the Streaming Speech-to-Text API.
The score in a sentiment analysis report represents the intensity or strength of the sentiment. It is not a confidence score. This score is always a value in the range [-1, 1]. A score below -0.3 indicates a negative (sad/angry) sentiment, while a score above 0.3 indicates a positive (joyful/happy) sentiment. Scores in the range [-0.3, 0.3] indicate neutral sentiment.
Rev AI's world-class speech engine is available on-premise in the form of a Docker container. It can run Rev AI's asynchronous speech-to-text on recorded media and is deployable in any Docker supported environment. Learn more about Asynchronous Speech-To-Text On-Premise or contact us.
Hardware requirements will vary depending on the number of transcriptions to be processed concurrently and the length of the audio.
Base configuration for processing a single transcription up to 1 hour in length:
- 1 CPU
- 7.5 GB RAM for processing audio lengths up to 1 hour
- 9.03 GB + 650 MB of available disk space
- 1 CPU per concurrent audio file being transcribed
- 7.5 GB RAM as a base, transcribing a single file up to 1 hour in length
- For each additional concurrent transcription add 1.5 GB.
- For each additional audio hour in file length add 1.5 GB.
- 9.03 GB image size
- 650 MB container size while processing a single transcription up to 1 hour in length
- For each additional concurrent transcription add 650 MB.
- For each additional audio hour in file length add 650 MB.
- Longer audio files require more storage for processing. Container size will vary depending on: the size of the file being transcribed, length of the audio and number of concurrent transcriptions.
- All files created during processing are deleted when the transcript is completed.
You are allowed a maximum of 2 access tokens at a time.
This is only required for applications which are to be billed separately. This is recommended when the customer has multiple environments or applications.
Yes. Learn more about Rev AI's HIPAA compliance.
In order for the Rev AI team to make sure your account and data sent through your account is HIPAA-enabled (and thus protected), you must sign our Business Associate Agreement (BAA) and an updated MSA. This BAA is an explicit agreement to make sure that both parties understand the responsibilities and contingencies associated with processing data which may contain PHI.
Once you have reviewed and signed the BAA and updated MSA:
- Create a new Rev AI account that you will use for HIPAA-enabled orders.
- Send your new Rev AI account information to your sales contact.
- Rev AI will update your account to enable HIPAA-compliant processing and notify you when this is complete.
- Confirm your account is HIPAA-enabled by visiting
Once the BAA has been executed and your account has been updated, your account will be ready to process PHI.
Learn more about Rev AI's HIPAA compliance.
Once your account is HIPAA-enabled, there are no changes to how you should submit API jobs, as your compliance configuration is done at the account level and not at job level.