Transcription Options

A number of transcription options are available, both machine and human.

Machine transcription

V2 ASR model

The v2 ASR model is our latest, most advanced ASR model. It supports English only and has full feature parity with our older v1 ASR model. It is our recommended model for all ASR transcription requests.

The v2 ASR model represents an entirely new architecture that promises 25-30% relative improvement in accuracy. Read more in our blog post describing the new v2 ASR model.

V1 ASR model

The v1 ASR model is our older ASR model, now available only to some users and for a limited period of time.

attention

The v1 ASR model and related user preference will be deprecated on September 8, 2022.

Default model

For new pay-as-you-go (PAYG) and enterprise user accounts created after March 7, 2022, the default transcription model is the v2 ASR model.

For existing PAYG user accounts, the default transcription model is:

  • the v1 ASR model until Apr 7, 2022
  • the v2 ASR model after Apr 7, 2022

For existing enterprise user accounts, the default transcription model is:

  • the v1 ASR model until Sep 7, 2022
  • the v2 ASR model after Sep 7, 2022

When no transcriber option is provided, or if the transcriber option is set to machine, transcription will be performed by the default model in effect for the user at the given time.

When the transcriber option is set to machine_v2, transcription will always be performed by the v2 ASR model, regardless of the default model in effect.

attention

The transcriber option is ignored for non-English transcription requests.

Early migration

Existing PAYG and enterprise users can request early migration to the new v2 ASR model by contacting the support team (for PAYG accounts) or contacting their account representative (for enterprise accounts).

Deferred migration

For a limited transition period, existing PAYG and enterprise users can request continued usage of the older v1 ASR model as default by filling a request form (for PAYG accounts) or contacting their account representative (for enterprise accounts).

Summary

The above information is summarized in the following table:

Migration date Default model1 before migration date Default model1 after migration date No transcriber OR transcriber: machine transcriber: machine_v2
PAYG / enterprise account created on / after Mar 7, 2022 N/A v2 v2 Default model v2 model
PAYG account created before Mar 7, 2022 Apr 7, 2022 v1 v2 Default model v2 model
Enterprise account created before Mar 7, 2022 Sep 7, 2022 v1 v2 Default model v2 model

1Subject to user preference for early/deferred migration

Human transcription

attention

This feature is currently under development in Rev AI Labs.

When the transcriber option is set to human, the audio file will be transcribed by a human. Since the job is handled by a human transcriber, the expected behavior differs from the ASR transcription services. Human transcription is only available for English transcription requests.

List pricing

Product Cost (per Minute)
Human Transcriber $1.50
Rush (Add-on) +$1.25
Verbatim (Add-on) +$0.50

Human transcription files are charged per second with a minimum charge of 1 minute.

Please reach out to labs@rev.ai if you have any questions on pricing.

attention

Enterprise pricing is available for enterprise customers. Contact your Rev Account Manager for more information.

Turnaround time

The expected turnaround time is 12 to 24 hours for human transcription jobs.

Allowed content

Certain types of audio content that are considered unworkable by our transcribers will not be transcribed. Audio that consists of non-English or music may be considered unworkable. Valid audio consists primarily of spoken English.

Additional request parameters

Rush

attention

Rush charges apply.

Set the rush parameter to true to increase the priority of the job to be worked on by human transcribers.

Verbatim

attention

Verbatim charges apply.

Set the verbatim parameter to true to tell the human transcriber to transcribe every syllable spoken, so the transcript will include things like disfluencies (i.e. ‘umm’,’ah’) and false starts. When not specified or set to false, the transcribers will follow the Rev.com transcription style guide.

Segments to transcribe

Use the segments_to_transcribe parameter to specify which sections of the audio file need to be transcribed. Segments must be at least two minutes in length and cannot overlap. The primary use case of this feature is to transcribe key segments of the audio, while ignoring the rest.

Speaker names

Use the speaker_names parameter to specify a list of speaker names to be given to the human transcriber. This list may be no more than 100 names long and each name is limited to 50 characters. Note that even though a name is provided, it may not necessarily be used, as the human transcriber still needs to be able to distinguish individual speaker voices from the provided audio.

Test mode

Set the test_mode parameter to true to mock a normal human transcription job. No transcription will happen in this case. The primary use case is to test integrations without being charged for human transcription.

Jobs submitted with this option will be in the in_progress state for a few minutes before completing and returning a dummy transcript.