Transcription Options

A number of transcription options are available, both machine and human.

Machine transcription

Default model

The v2 ASR model is our latest, most advanced ASR model. It supports English only and has full feature parity with our older v1 ASR model. It is our default model for all ASR transcription requests.

When no transcriber option is provided, or if the transcriber option is set to machine, transcription will be performed by the v2 ASR model.

The transcriber option is ignored for non-English transcription requests.


The v2 ASR model represents an entirely new architecture that promises 25-30% relative improvement in accuracy. Read more in our blog post describing the new v2 ASR model.

V1 ASR model

The v1 ASR model was our older ASR model. It was deprecated on September 8, 2022 and is no longer available.

Human transcription

When the transcriber option is set to human, the audio file will be transcribed by a human. Since the job is handled by a human transcriber, the expected behavior differs from the ASR transcription services. Human transcription is only available for English transcription requests.

List pricing

Refer to the Rev AI pricing page for up-to-date prices. Add-on prices are stated below and are in addition to the list pricing.

Product Cost (per Minute)
Rush (Add-on) +$1.25
Verbatim (Add-on) +$0.50

Please reach out to if you have any questions on pricing.


Enterprise pricing is available for enterprise customers. Contact your Rev Account Manager for more information.

Turnaround time

The expected turnaround time is 12 to 24 hours for human transcription jobs. However, human transcription results are not available via the Rev website. They are returned via the API like any other Rev AI job.

Allowed content

Certain types of audio content that are considered unworkable by our transcribers will not be transcribed. Audio that consists of non-English or music may be considered unworkable. Valid audio consists primarily of spoken English.

Custom vocabulary

Phrases from the custom vocabulary list (if available) is sent to the human transcriber as glossary. This is limited to a maximum of 20 phrases and a maximum of 255 characters per phrase.

Additional request parameters



Rush charges apply.

Set the rush parameter to true to increase the priority of the job to be worked on by human transcribers.



Verbatim charges apply.

Set the verbatim parameter to true to tell the human transcriber to transcribe every syllable spoken, so the transcript will include things like disfluencies (i.e. ‘umm’,’ah’) and false starts. When not specified or set to false, the transcribers will follow the transcription style guide.

Segments to transcribe

Use the segments_to_transcribe parameter to specify which sections of the audio file need to be transcribed. Segments must be at least 1 minute in length and cannot overlap. The primary use case of this feature is to transcribe key segments of the audio, while ignoring the rest.

Speaker names

Use the speaker_names parameter to specify a list of speaker names to be given to the human transcriber. This list may be no more than 100 names long and each name is limited to 50 characters. Note that even though a name is provided, it may not necessarily be used, as the human transcriber still needs to be able to distinguish individual speaker voices from the provided audio.

Test mode

Set the test_mode parameter to true to mock a normal human transcription job. No transcription will happen in this case. The primary use case is to test integrations without being charged for human transcription.

Jobs submitted with this option will be in the in_progress state for a few minutes before completing and returning a dummy transcript.