Rev AI supports 30+ languages in the Asynchronous Speech-to-Text API and 9+ languages in the Streaming Speech-to-Text API. New languages are frequently added. Please refer to the Asynchronous Speech-to-Text API documentation for the most current list of supported languages.
This depends on the exact pause length but usually, a long pause will cause the transcript to start a new paragraph when speech resumes. Pauses are indicated by the timestamps on the words around them. There will be a jump in timestamps on the words around the pause.
The following default limits apply per user, per endpoint for the Asynchronous Speech-to-Text API:
- 10,000 transcription requests submitted every 10 minutes.
- 500 transcriptions processed every 10 minutes.
- Multi-part/form-data requests to the
/jobsendpoint have a concurrency limit of 5 and a file size limit of 2 GB.
Any submissions over this will be accepted but put into a queue and not started until the next interval.
These limits are adjustable by Rev AI support.
The following limits are in place for the Streaming Speech-to-Text API:
- Streaming concurrency limit is 10.
- Time limit per stream is 3 hours.
The streaming concurrency limit is adjustable by Rev AI support.
When your stream approaches the 3-hour limit, you should initialize a new concurrent WebSocket connection. Once your WebSocket connection is accepted and the
"connected" type message is received, switch to the new WebSocket and begin streaming audio to it.
Yes. Refer to the Streaming Speech-to-Text API documentation for RTMP streams.
There are no daily or weekly limits.
Rev AI outputs three punctuation labels: commas (
,), periods (
.) and question marks(
Rev AI supports 8 speakers for English-language transcription and 6 for non English transcription.
There is currently no way to specify or limit the number of speakers. If each speaker is recorded on a separate channel, then the user can specify the total number of unique speaker channels in the audio via the
speaker_channel_count option in the Asynchronous Speech-to-Text API.
Speaker diarization is the process of detecting speaker switches in audio and assigning transcript segments to individual speakers with generic speaker labels such as "Speaker 1" and "Speaker 2".
Speaker identification is the process of identifying individual voices and assigning identities to each in the transcript.
Rev AI supports speaker diarization but does not support speaker identification. Although specific speakers are not identified, Rev AI is able to detect speaker switches and represent them in the transcript with numbered speaker labels. For example, if Anna speaks and then Sarah speaks, the API detects the speaker switch and labels the voices as "Speaker 1" and "Speaker 2".
No. If speakers are not being correctly assigned, note that speaker assignment can be affected by short utterances or variations in audio and other variables.
No. There is currently no API endpoint to update a custom vocabulary list. However, you can create an infinite number of custom vocabularies without needing to delete or update existing ones.
The score in a sentiment analysis report represents the intensity or strength of the sentiment. It is not a confidence score. This score is always a value in the range [-1, 1]. A score below -0.3 indicates a negative (sad/angry) sentiment, while a score above 0.3 indicates a positive (joyful/happy) sentiment. Scores in the range [-0.3, 0.3] indicate neutral sentiment.
Rev AI's world-class speech engine is available on-premise in the form of a Docker container. It can run Rev AI's asynchronous speech-to-text on recorded media and is deployable in any Docker supported environment. [Learn more about Asynchronous Speech-To-Text On-Premise](https://public-rev.s3-us-west-2.amazonaws.com/revai/Rev AI+Speech+to+Text+On-Premises.pdf) or contact us.
You are allowed a maximum of 2 access tokens at a time.
This is only required for applications which are to be billed separately. This is recommended when the customer has multiple environments or applications.