Frequently Asked Questions

Languages

Transcription

Deployment

Security

Languages

What languages does Rev AI support?

Rev AI supports 58+ languages in the Asynchronous Speech-to-Text API and 9+ languages in the Streaming Speech-to-Text API. New languages are frequently added. Please refer to the current list of supported languages.

Is there a way to transcribe from one language to another (automatic translation)?

Yes. You can specify translation_config parameter when submitting a job. Learn more about Asynchronous Speech-to-Text API.

Transcription

How are long pauses in speech represented in the transcript?

This depends on the exact pause length but usually, a long pause will cause the transcript to start a new paragraph when speech resumes. Pauses are indicated by the timestamps on the words around them. There will be a jump in timestamps on the words around the pause.

Are there limits on the number of jobs that can be processed concurrently?

The following default limits apply per user, per endpoint for the Asynchronous Speech-to-Text API:

10,000 transcription requests submitted every 10 minutes.
500 transcriptions processed every 10 minutes. Any submissions over this will be accepted but put into a queue and not started until the next interval.
Maximum audio duration of 17 hours.
File uploads submitted as multipart/form-data requests to the /jobs endpoint have a concurrency limit of 5 and a file size limit of 2 GB per request.
File uploads via the Rev AI dashboard or using the source_config job parameter have a file size limit of 5 TB.

These limits are adjustable by Rev AI support.

What type of media files does Rev AI support?

Rev AI uses FFmpeg and therefore supports all the file formats supported by FFmpeg. This includes all common media formats, such as MP3, MP4, Ogg, WAV, PCM and FLAC and many more.

What is the maximum allowed file size and audio duration?

The maximum allowed file size depends on the submission method. If uploading a local file to the Rev AI server from the host machine as a multipart/form-data request, there is a file size limit of 2 GB per request. If uploading a local file via the Rev AI dashboard or submitting via the source_config job parameter, there is a file size limit of 5 TB. Learn more about file submission methods.

The maximum allowed audio length is 17 hours. For audio longer than 17 hours, it is necessary to split the audio file into chunks smaller than 17 hours and have them individually transcribed.

Are there limits on the number of streams that can be processed concurrently?

The following limits are in place for the Streaming Speech-to-Text API:

Streaming concurrency limit is 10.
Time limit per stream is 3 hours.

Job queuing is not supported. If a job is submitted while a user is at the concurrency limit, the API will return a 4029 error.

The streaming concurrency limit is adjustable by Rev AI support.

What if my stream is longer than 3 hours?

When your stream approaches the 3-hour limit, you should initialize a new concurrent WebSocket connection. Once your WebSocket connection is accepted and the "connected" type message is received, switch to the new WebSocket and begin streaming audio to it. Learn more in our tutorial on recovering from connection errors and timeouts in Rev AI streaming transcription sessions

Does Rev AI support RTMP streams?

Yes. Refer to the Streaming Speech-to-Text API documentation for RTMP streams.

Are there daily or weekly limits?

There are no daily or weekly limits.

Is there a time limit for processing individual jobs?

Yes, there are maximum job processing timeouts which differ for each type of job. There is also a maximum transcription time limit per job.

How long will my jobs be accessible on the Rev AI server?

Jobs remain accessible on the server for 30 days after completion unless the account is configured for a shorter auto-deletion period.

Can I submit multi-track audio files?

Yes. Rev AI accepts most audio and video formats. Each track can be transcribed separately.

What punctuation labels does Rev AI support?

Rev AI outputs three punctuation labels: commas (,), periods (.) and question marks(?).

What is the maximum number of speakers supported?

Rev AI supports 8 speakers for English-language transcription and 6 for non English transcription.

Is it possible to explicitly specify or limit the number of detected speakers?

There is currently no way to specify or limit the number of speakers. If each speaker is recorded on a separate channel, then the user can specify the total number of unique speaker channels in the audio via the speaker_channel_count option in the Asynchronous Speech-to-Text API.

Does Rev AI support speaker identification?

Speaker diarization is the process of detecting speaker switches in audio and assigning transcript segments to individual speakers with generic speaker labels such as "Speaker 1" and "Speaker 2".

Speaker identification is the process of identifying individual voices and assigning identities to each in the transcript.

Rev AI supports speaker diarization but does not support speaker identification. Although specific speakers are not identified, Rev AI is able to detect speaker switches and represent them in the transcript with numbered speaker labels. For example, if Anna speaks and then Sarah speaks, the API detects the speaker switch and labels the voices as "Speaker 1" and "Speaker 2".

Are any parameters available to control speaker assignment for audio segments?

No. If speakers are not being correctly assigned, note that speaker assignment can be affected by short utterances or variations in audio and other variables.

Can I automatically turn Rev AI off if there’s long periods of silence?

No.

Can I use Rev AI for dictation?

Rev AI is not a dictation service. This means that Rev AI will not transcribe words such as "comma" and "period" to their respective symbols. If you are looking for a reader-friendly, well-formatted transcript where you can view/toggle the audio along with the transcript, we recommend using our automated transcription service.

Is it possible to add words to the custom vocabulary after initial submission?

No. There is currently no API endpoint to update a custom vocabulary list. However, you can create an infinite number of custom vocabularies without needing to delete or update existing ones.

How many custom vocabulary terms can I include?

Up to 6000 phrases may be submitted per transcription job for English, and up to 1000 for other languages. We recommend submitting a short list of target terms (no more than 500 phrases) as large lists may negatively impact performance and accuracy. Short phrases also do better than long phrases, so keep your phrases on the short side if possible. For more information, refer to the Custom Vocabulary API limits and general rules.

How long until my transcript is available?

You can usually expect your transcript to be available within 15 minutes of submitting your media file to our Asynchronous Speech-to-Text API. Most often, it will be available in less than 15 minutes, especially if your media is short duration.

For faster results, can I skip some steps of the transcription process?

Yes. Use the skip_postprocessing parameter to skip some steps (inverse text normalization or ITN, casing and punctuation) of a transcription job. This parameter is useful to reduce the time taken for a transcription job, or to provide greater control over transcription output. The skip_postprocessing parameter is available for both the Asynchronous Speech-to-Text API and the Streaming Speech-to-Text API.

What does the score in a sentiment analysis report represent?

The score in a sentiment analysis report represents the intensity or strength of the sentiment. It is not a confidence score. This score is always a value in the range [-1, 1]. A score below -0.3 indicates a negative (sad/angry) sentiment, while a score above 0.3 indicates a positive (joyful/happy) sentiment. Scores in the range [-0.3, 0.3] indicate neutral sentiment.

Deployment

Can I use any database or backend with Rev AI?

Yes.

Are there limits on the number of users who can use Rev AI in my application?

No.

Can I deploy Rev AI on-premise?

Rev AI's world-class speech engine is available on-premise in the form of a Docker container. It can run Rev AI's asynchronous speech-to-text on recorded media and is deployable in any Docker supported environment. Learn more about Asynchronous Speech-To-Text On-Premise or contact us.

What are the technical requirements for on-premise deployment?

Hardware requirements will vary depending on the number of transcriptions to be processed concurrently and the length of the audio.

Base configuration for processing a single transcription up to 1 hour in length:

1 CPU
7.5 GB RAM for processing audio lengths up to 1 hour
9.03 GB + 650 MB of available disk space

Hardware:

1 CPU per concurrent audio file being transcribed
7.5 GB RAM as a base, transcribing a single file up to 1 hour in length
For each additional concurrent transcription add 1.5 GB.
For each additional audio hour in file length add 1.5 GB.

Storage:

9.03 GB image size
650 MB container size while processing a single transcription up to 1 hour in length
For each additional concurrent transcription add 650 MB.
For each additional audio hour in file length add 650 MB.
Longer audio files require more storage for processing. Container size will vary depending on: the size of the file being transcribed, length of the audio and number of concurrent transcriptions.
All files created during processing are deleted when the transcript is completed.

Security

How many access tokens can I have?

You are allowed a maximum of 2 access tokens at a time.

Do I need different access tokens for different applications?

This is only required for applications which are to be billed separately. This is recommended when the customer has multiple environments or applications.

Is Rev AI HIPAA-compliant?

Yes. Learn more about Rev AI's HIPAA compliance.

Do I need to do anything to make sure my account is HIPAA-enabled?

In order for the Rev AI team to make sure your account and data sent through your account is HIPAA-enabled (and thus protected), you must sign our Business Associate Agreement (BAA) and an updated MSA. This BAA is an explicit agreement to make sure that both parties understand the responsibilities and contingencies associated with processing data which may contain PHI.

Once you have reviewed and signed the BAA and updated MSA:

Create a new Rev AI account that you will use for HIPAA-enabled orders.
Send your new Rev AI account information to your sales contact.
Rev AI will update your account to enable HIPAA-compliant processing and notify you when this is complete.
Confirm your account is HIPAA-enabled by visiting https://rev.ai/account.

Once the BAA has been executed and your account has been updated, your account will be ready to process PHI.

Learn more about Rev AI's HIPAA compliance.

Once my account is HIPAA-enabled, are there any changes to how I should submit API jobs?

Once your account is HIPAA-enabled, there are no changes to how you should submit API jobs, as your compliance configuration is done at the account level and not at job level.

Still have questions?

Visit the Help Center Contact us