Transcription Options
A number of transcription options are available, both machine and human.
Machine transcription
V2 ASR model
The v2 ASR model is our latest, most advanced ASR model. It supports English only and has full feature parity with our older v1 ASR model. It is our recommended model for all ASR transcription requests.
The v2 ASR model represents an entirely new architecture that promises 25-30% relative improvement in accuracy. Read more in our blog post describing the new v2 ASR model.
V1 ASR model
The v1 ASR model is our older ASR model, now available only to some users and for a limited period of time.
attention
The v1 ASR model and related user preference will be deprecated on September 8, 2022.
Default model
For new pay-as-you-go (PAYG) and enterprise user accounts created after March 7, 2022, the default transcription model is the v2 ASR model.
For existing PAYG user accounts, the default transcription model is:
- the v1 ASR model until Apr 7, 2022
- the v2 ASR model after Apr 7, 2022
For existing enterprise user accounts, the default transcription model is:
- the v1 ASR model until Sep 7, 2022
- the v2 ASR model after Sep 7, 2022
When no transcriber
option is provided, or if the transcriber
option is set to machine
, transcription will be performed by the default model in effect for the user at the given time.
When the transcriber
option is set to machine_v2
, transcription will always be performed by the
v2 ASR model, regardless of the default model in effect.
attention
The transcriber
option is ignored for non-English transcription requests.
Early migration
Existing PAYG and enterprise users can request early migration to the new v2 ASR model by contacting the support team (for PAYG accounts) or contacting their account representative (for enterprise accounts).
Deferred migration
For a limited transition period, existing PAYG and enterprise users can request continued usage of the older v1 ASR model as default by filling a request form (for PAYG accounts) or contacting their account representative (for enterprise accounts).
Summary
The above information is summarized in the following table:
Migration date | Default model1 before migration date | Default model1 after migration date | No transcriber OR transcriber: machine | transcriber: machine_v2 | |
---|---|---|---|---|---|
PAYG / enterprise account created on / after Mar 7, 2022 | N/A | v2 | v2 | Default model | v2 model |
PAYG account created before Mar 7, 2022 | Apr 7, 2022 | v1 | v2 | Default model | v2 model |
Enterprise account created before Mar 7, 2022 | Sep 7, 2022 | v1 | v2 | Default model | v2 model |
1Subject to user preference for early/deferred migration
Human transcription
attention
This feature is currently under development in Rev AI Labs.
When the transcriber
option is set to human
, the audio file will be transcribed by a human. Since the job is handled by a human transcriber, the expected behavior differs from the ASR transcription services. Human transcription is only available for English transcription requests.
List pricing
Product | Cost (per Minute) |
---|---|
Human Transcriber | $1.50 |
Rush (Add-on) | +$1.25 |
Verbatim (Add-on) | +$0.50 |
Human transcription files are charged per second with a minimum charge of 1 minute.
Please reach out to labs@rev.ai if you have any questions on pricing.
attention
Enterprise pricing is available for enterprise customers. Contact your Rev Account Manager for more information.
Turnaround time
The expected turnaround time is 12 to 24 hours for human transcription jobs.
Allowed content
Certain types of audio content that are considered unworkable by our transcribers will not be transcribed. Audio that consists of non-English or music may be considered unworkable. Valid audio consists primarily of spoken English.
Additional request parameters
Rush
attention
Rush charges apply.
Set the rush
parameter to true
to increase the priority of the job to be worked on by human transcribers.
Verbatim
attention
Verbatim charges apply.
Set the verbatim
parameter to true
to tell the human transcriber to transcribe every syllable spoken, so the transcript will include things like disfluencies (i.e. ‘umm’,’ah’) and false starts. When not specified or set to false
, the transcribers will follow the Rev.com transcription style guide.
Segments to transcribe
Use the segments_to_transcribe
parameter to specify which sections of the audio file need to be transcribed. Segments must be at least two minutes in length and cannot overlap. The primary use case of this feature is to transcribe key segments of the audio, while ignoring the rest.
Speaker names
Use the speaker_names
parameter to specify a list of speaker names to be given to the human transcriber. This list may be no more than 100 names long and each name is limited to 50 characters. Note that even though a name is provided, it may not necessarily be used, as the human transcriber still needs to be able to distinguish individual speaker voices from the provided audio.
Test mode
Set the test_mode
parameter to true
to mock a normal human transcription job. No transcription will happen in this case. The primary use case is to test integrations without being charged for human transcription.
Jobs submitted with this option will be in the in_progress
state for a few minutes before completing and returning a dummy transcript.