Frequently Asked Questions


What languages does Rev AI support?

Rev AI supports 30+ languages in the Asynchronous Speech-to-Text API and 9+ languages in the Streaming Speech-to-Text API. New languages are frequently added. Please refer to the Asynchronous Speech-to-Text API documentation for the most current list of supported languages.

Is there a way to transcribe from one language to another (automatic translation)?



How are long pauses in speech represented in the transcript?

This depends on the exact pause length but usually, a long pause will cause the transcript to start a new paragraph when speech resumes. Pauses are indicated by the timestamps on the words around them. There will be a jump in timestamps on the words around the pause.

Are there limits on the number of jobs that can be processed concurrently?

The following default limits apply per user, per endpoint for the Asynchronous Speech-to-Text API:

  • 10,000 transcription requests submitted every 10 minutes.
  • 500 transcriptions processed every 10 minutes.
  • Multi-part/form-data requests to the /jobs endpoint have a concurrency limit of 5 and a file size limit of 2 GB.

Any submissions over this will be accepted but put into a queue and not started until the next interval.

These limits are adjustable by Rev AI support.

Are there limits on the number of streams that can be processed concurrently?

The following limits are in place for the Streaming Speech-to-Text API:

  • Streaming concurrency limit is 10.
  • Time limit per stream is 3 hours.

The streaming concurrency limit is adjustable by Rev AI support.

What if my stream is longer than 3 hours?

When your stream approaches the 3-hour limit, you should initialize a new concurrent WebSocket connection. Once your WebSocket connection is accepted and the "connected" type message is received, switch to the new WebSocket and begin streaming audio to it.

Does Rev AI support RTMP streams?

Yes. Refer to the Streaming Speech-to-Text API documentation for RTMP streams.

Are there daily or weekly limits?

There are no daily or weekly limits.

What punctuation labels does Rev AI support?

Rev AI outputs three punctuation labels: commas (,), periods (.) and question marks(?).

What is the maximum number of speakers supported?

Rev AI supports 8 speakers for English-language transcription and 6 for non English transcription.

Is it possible to explicitly specify or limit the number of detected speakers?

There is currently no way to specify or limit the number of speakers. If each speaker is recorded on a separate channel, then the user can specify the total number of unique speaker channels in the audio via the speaker_channel_count option in the Asynchronous Speech-to-Text API.

Does Rev AI support speaker identification?

Speaker diarization is the process of detecting speaker switches in audio and assigning transcript segments to individual speakers with generic speaker labels such as "Speaker 1" and "Speaker 2".

Speaker identification is the process of identifying individual voices and assigning identities to each in the transcript.

Rev AI supports speaker diarization but does not support speaker identification. Although specific speakers are not identified, Rev AI is able to detect speaker switches and represent them in the transcript with numbered speaker labels. For example, if Anna speaks and then Sarah speaks, the API detects the speaker switch and labels the voices as "Speaker 1" and "Speaker 2".

Are any parameters available to control speaker assignment for audio segments?

No. If speakers are not being correctly assigned, note that speaker assignment can be affected by short utterances or variations in audio and other variables.

Can I automatically turn Rev AI off if there’s long periods of silence?


Is it possible to add words to the custom vocabulary after initial submission?

No. There is currently no API endpoint to update a custom vocabulary list. However, you can create an infinite number of custom vocabularies without needing to delete or update existing ones.

What does the score in a sentiment analysis report represent?

The score in a sentiment analysis report represents the intensity or strength of the sentiment. It is not a confidence score. This score is always a value in the range [-1, 1]. A score below -0.3 indicates a negative (sad/angry) sentiment, while a score above 0.3 indicates a positive (joyful/happy) sentiment. Scores in the range [-0.3, 0.3] indicate neutral sentiment.


Can I use any database or backend with Rev AI?


Are there limits on the number of users who can use Rev AI in my application?


Can I deploy Rev AI on-premise?

Rev AI's world-class speech engine is available on-premise in the form of a Docker container. It can run Rev AI's asynchronous speech-to-text on recorded media and is deployable in any Docker supported environment. [Learn more about Asynchronous Speech-To-Text On-Premise]( AI+Speech+to+Text+On-Premises.pdf) or contact us.


How many access tokens can I have?

You are allowed a maximum of 2 access tokens at a time.

Do I need different access tokens for different applications?

This is only required for applications which are to be billed separately. This is recommended when the customer has multiple environments or applications.