Jobs

Get Job By Id

Returns information about a transcription job

SecurityAccessToken
Request
path Parameters
id
required
string

Rev AI API Job Id

Responses
200

Transcription Job Details

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

get/jobs/{id}
Request samples
curl -X GET "https://api.rev.ai/speechtotext/v1/jobs/{id}" -H "Authorization: Bearer $REV_ACCESS_TOKEN"
Response samples
application/json
{
  • "id": "Umx5c6F7pH7r",
  • "status": "in_progress",
  • "language": "en",
  • "created_on": "2018-05-05T23:23:22.29Z",
  • "transcriber": "machine"
}

Delete Job by Id

Deletes a transcription job. All data related to the job, such as input media and transcript, will be permanently deleted. A job can only be deleted once it's completed (either with success or failure).

SecurityAccessToken
Request
path Parameters
id
required
string

Rev AI API Job Id

Responses
204

Job was successfully deleted

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

409

Conflict

delete/jobs/{id}
Request samples
curl -X DELETE "https://api.rev.ai/speechtotext/v1/jobs/{id}" -H "Authorization: Bearer $REV_ACCESS_TOKEN"
Response samples
application/problem+json
{
  • "title": "Authorization has been denied for this request",
  • "status": 401
}

Get List of Jobs

Gets a list of transcription jobs submitted within the last 30 days in reverse chronological order up to the provided limit number of jobs per call. Note: Jobs older than 30 days will not be listed. Pagination is supported via passing the last job id from a previous call into starting_after.

SecurityAccessToken
Request
query Parameters
limit
integer or null [ 0 .. 1000 ]
Default: 100

Limits the number of jobs returned, default is 100, max is 1000

starting_after
string or null

If specified, returns jobs submitted before the job with this id, exclusive (job with this id is not included)

Responses
200

List of Rev AI Transcription Jobs

400

Bad Request

401

Request Unauthorized

403

User does not have permission to access this deployment

get/jobs
Request samples
# Get list of jobs with a limit of 10 jobs
curl -X GET "https://api.rev.ai/speechtotext/v1/jobs?limit=10" -H "Authorization: Bearer $REV_ACCESS_TOKEN"

# Get list of jobs starting after (submitted before) job with id Umx5c6F7pH7r
curl -X GET "https://api.rev.ai/speechtotext/v1/jobs?starting_after=Umx5c6F7pH7r" -H "Authorization: Bearer $REV_ACCESS_TOKEN"
Response samples
application/json
[
  • {
    }
]

Submit Transcription Job

Starts an asynchronous job to transcribe speech-to-text for a media file. Media files can be specified in two ways, either by including a public url to the media in the transcription job options or by uploading a local file as part of a multipart/form request.

SecurityAccessToken
Request
Request Body schema:

Transcription Job Options

media_url
string <= 2048 characters
Deprecated

[HIPAA Unsupported] Deprecated. Use source_config instead. Direct download media url. Ignored if submitting job from file. **Note:**Most languages support media files with duration up to 17 hours, with the exception for Telugu (te) which has a limit of 6 hours. For non-English jobs, expected turnaround time can be up to 6 hours. If this parameter is used to pass in the media url, the media url will be visible in the response. It is recommended to use the source_config parameter instead, as authorization headers can be included and both the media url and auth headers will be encrypted when stored.

object or null

Optional authorization headers, if they are needed to access the resource at the URL. Headers could be a single Authorization header of the form <scheme> <token>, and one of the AWS signature v4 headers. Only one of source_config and media_url may be set. This option will not be visible in the submission response.

metadata
string or null <= 512 characters

Optional metadata that was provided during job submission.

callback_url
string or null <= 1024 characters
Deprecated

Deprecated. Use notification_config instead. Optional callback url to invoke when processing is complete. If this parameter is used to pass in the callback url, the callback url will be visible in the response. It is recommended to provide webhooks with the notification_config parameter as authorization headers can be included and both the callback url and auth headers will be encrypted when stored.

object or null

Optional configuration for a callback url to invoke when processing is complete, in addition to auth headers if they are needed to invoke the callback url. Cannot be set if callback_url is set. This option will not be visible in the submission response.

delete_after_seconds
integer or null [ 0 .. 2592000 ]

Amount of time after job completion when job is auto-deleted. Present only when preference set in job request.

transcriber
string or null
Default: "machine"

Select which service you would like to transcribe this file with.

Model Description
machine the default and routes to our standard (Reverb) model.
human [HIPAA Unsupported] routes the file to our human transcribers.
low_cost low-cost transcription which uses quantized ASR model (Reverb Turbo) with low-cost environment.
fusion higher quality ASR that combines multiple models to achieve the best results. Typically has better support for rare words.
Enum: "machine" "human" "low_cost" "fusion"
verbatim
boolean

Configures the transcriber to transcribe every syllable. This will include all false starts and disfluencies in the transcript.

The behavior depends on the transcriber option:

Transcriber Description
machine the default is true. To turn it off false should be explicitly provided
human the default is false To turn it on true should be explicitly provided
rush
boolean or null
Default: false

[HIPAA Unsupported] Only available for human transcriber option When this field is set to true your job is given higher priority and will be worked on sooner by our human transcribers.

test_mode
boolean or null
Default: false

[HIPAA Unsupported] Only available for human transcriber option When this field is set to true the behavior will mock a normal human transcription job except no transcription will happen. The primary use case is to test integrations without being charged for human transcription.

Array of objects or null

[HIPAA Unsupported] Only available for human transcriber option. Use this option to specify which sections of the transcript need to be transcribed. Segments must be at least 1 minute in length and cannot overlap.

Array of objects or null [ 0 .. 100 ] items

[HIPAA Unsupported] Only available for human transcriber option. Use this option to specify up to 100 names of speakers in the transcript. Names may only be up to 50 characters long.

skip_diarization
boolean or null
Default: false

Specify if speaker diarization will be skipped by the speech engine

skip_postprocessing
boolean or null
Default: false

Only available for English and Spanish languages. User-supplied preference on whether to skip post-processing operations such as inverse text normalization (ITN), casing and punctuation.

skip_punctuation
boolean or null
Default: false

Specify if "punct" type elements will be skipped by the speech engine. For JSON outputs, this includes removing spaces. For text outputs, words will still be delimited by a space

remove_disfluencies
boolean or null
Default: false

Currently we only define disfluencies as 'ums' and 'uhs'. When set to true, disfluencies will not appear in the transcript. This option also removes atmospherics if the remove_atmospherics is not set. This option is not available for human transcription jobs.

remove_atmospherics
boolean or null
Default: false

We define many atmospherics such <laugh>, <affirmative> etc. When set to true, atmospherics will not appear in the transcript. This option is not available for human transcription jobs.

filter_profanity
boolean or null
Default: false

Enabling this option will filter for approx. 600 profanities, which cover most use cases. If a transcribed word matches a word on this list, then all the characters of that word will be replaced by asterisks except for the first and last character.

speaker_channels_count
integer or null [ 1 .. 8 ]

Only available for English, Spanish and French languages. Use to specify the total number of unique speaker channels in the audio.

Given the number of audio channels provided, each channel will be transcribed separately and the channel id assigned to the speaker label. The final output will be a combination of all individual channel outputs. Overlapping monologues will have ordering broken by the order in which the first spoken element of each monologue occurs. If speaker_channels_count is greater than the actual channels in the audio, the job will fail with invalid_media. This option is not available for human transcription jobs.

Best practice:

  • If you have speakers recorded on individual audio channels in the same file (example: Interviewer on channel 1 (left channel) and Applicant on channel 2 (right channel) then use speaker_channels_count. There are extra costs to this approach because the channels are transcribed separately. Benefits: perfect diarization, higher accuracy transcription during periods of long cross-talk (speakers talking over one another).
  • If you have generic audio with an unknown number of speakers, do not specify speakers_count or speaker_channels_count

Note:

  • The amount charged will be the duration of the file multiplied by the number of channels specified.
  • When using speaker_channels_count each channel will be diarized as one speaker, and the value of skip_diarization will be ignored if provided
speakers_count
integer or null >= 1
Default: null

Only available for English, Spanish and French languages. Use to specify the total number of unique speakers in the audio.

Given the count of speakers provided, it will be used to improve the diarization accuracy. This option is not available for human transcription jobs.

Best practice:

  • If you have an audio with multiple speakers and the number of speakers is definitively known, use speakers_count. This will provide a hint to the speech engine to make sure the correct number of speakers are idenitified.
  • If you have generic audio with an unknown number of speakers, do not specify speakers_count or speaker_channels_count

Note:

  • When using speaker_channels_count each channel will be diarized based on the speakers count provided.
  • When using skip_diarization the speakers count will be ignored if provided.
diarization_type
string or null
Default: "standard"

Use to specify diarization type. This option is not available for human transcription jobs and low-cost environment.

Enum: "standard" "premium"
custom_vocabulary_id
string or null

This feature is in beta. You can supply the id of a pre-completed custom vocabulary that you submitted through the Custom Vocabularies API instead of uploading the list of phrases using the custom_vocabularies parameter. Using custom_vocabulary_id or custom_vocabularies with the same list of phrases yields the same transcription result, but custom_vocabulary_id enables your submission to finish processing faster by 6 seconds on average.

You cannot use both custom_vocabulary_id and custom_vocabularies at the same time, and doing so will result in a 400 response. If the supplied id represents an incomplete, deleted, or non-existent custom vocabulary then you will receive a 404 response.

Array of objects [ 1 .. 50 ] items

Specify a collection of custom vocabulary to be used for this job. Custom vocabulary informs and biases the speech recognition to find those phrases (at the cost of slightly slower transcription).

strict_custom_vocabulary
boolean

If true, only exact phrases will be used as custom vocabulary, i.e. phrases will not be split into individual words for processing. By default is enabled.

object or null

Use to specify summarization options. This option is not available for human transcription jobs.

object or null

Use to specify translation options. This option is not available for human transcription jobs.

language
string or null
Default: "en"

language is provided as a ISO 639-1 language code, with the following exceptions:

  • Multilingual English/Spanish (en/es), which does not follow any existing ISO language code convention
  • English US (en-us), which is an ISO 639-1 language code with region United States
  • English UK (en-gb), which is an ISO 639-1 language code with region United Kingdom
  • Mandarin (cmn), which is supplied as an ISO 639-3 language code

Only 1 language can be selected per audio, i.e. no multiple languages in one transcription job. Additionally, the following parameters may not be used with non-English languages: skip_punctuation, remove_disfluencies, filter_profanity, speaker_channels_count, custom_vocabulary_id.

You can provide a language parameter for transcribing audio in one of the following languages:

Language Language Code HIPAA Support (US) EU Deployment Support
Multilingual English/Spanish en/es x
Afrikaans af x
Arabic ar x x
Armenian hy x
Azerbaijani az x
Belarusian be x
Bosnian bs x
Bulgarian bg x x
Catalan ca x x
Croatian hr x x
Czech cs x x
Danish da x x
Dutch nl x x
English en x x
English (UK) en-gb x x
English (US) en-us x x
Estonian et x
Farsi fa x x
Finnish fi x x
French fr x x
Galician gl x
German de x x
Greek el x x
Hebrew he x x
Hindi hi x x
Hungarian hu x x
Icelandic is x
Indonesian id x x
Italian it x x
Japanese ja x x
Kannada kn x
Kazakh kk x
Korean ko x x
Latvian lv x x
Lithuanian lt x x
Macedonian mk x
Malay ms x x
Mandarin cmn x x
Marathi mr x
Nepali ne x
Norwegian no x x
Polish pl x x
Portuguese pt x x
Romanian ro x x
Russian ru x x
Serbian sr x
Slovak sk x x
Slovenian sl x x
Spanish es x x
Swahili sw x
Swedish sv x x
Tagalog tl x
Tamil ta x x
Telugu te x x
Thai th x
Turkish tr x x
Ukrainian uk x
Urdu ur x
Vietnamese vi x
Welsh cy x
forced_alignment
boolean or null
Default: false

Provides improved accuracy for per-word timestamps for a transcript.

The following languages are currently supported:

  • English (en, en-us, en-gb)
  • French (fr)
  • Italian (it)
  • German (de)
  • Spanish (es)

This option is not available in low-cost environment

Responses
200

Transcription Job Details

400

Bad Request

401

Request Unauthorized

403

Request Forbidden

413

Payload Too Large


Only returned when job is submitted using a local file as part of multipart/form-data. Submit a job with the source_config parameter for files larger than 2GBs

post/jobs
Request samples
{
  • "metadata": "example metadata",
  • "notification_config": {},
  • "source_config": {},
  • "transcriber": "machine",
  • "skip_diarization": false,
  • "skip_punctuation": false,
  • "skip_postprocessing": false,
  • "remove_disfluencies": false,
  • "filter_profanity": false,
  • "speaker_channel_count": 1,
  • "delete_after_seconds": 2592000,
  • "custom_vocabulary_id": null,
  • "language": "en"
}
Response samples
application/json
{
  • "id": "Umx5c6F7pH7r",
  • "status": "in_progress",
  • "language": "en",
  • "created_on": "2018-05-05T23:23:22.29Z",
  • "transcriber": "machine"
}

Transcripts

Get Transcript By Id

Returns the transcript for a completed transcription job. Transcript can be returned as either JSON or plaintext format. Transcript output format can be specified in the Accept header. Returns JSON by default.


Note: For streaming jobs, transient failure of our storage during a live session may prevent the final hypothesis elements from saving properly, resulting in an incomplete transcript. This is rare, but not impossible. To guarantee 100% completeness, we recommend capturing all final hypothesis when you receive them on the client.

SecurityAccessToken
Request
path Parameters
id
required
string

Rev AI API Job Id

query Parameters
group_channels_by
string or null
Default: "word"

Specifies the grouping strategy for organizing tokens in the output transcript for multichannel inputs (speaker_channels_count > 1). The parameter determines the granularity of grouping based on speaker changes or interruptions. Allowed values are:

  • speaker: Groups all tokens spoken by the same speaker, preserving continuity in monologues.
  • sentence: Groups tokens at the sentence level. Sentences remain intact even when interrupted by another speaker.
  • word: Groups tokens at the word level. In cases of interruptions, monologues are split per word, ensuring no overlap. This parameter works together with group_channels_threshold_ms to define how interruptions between speakers are handled.
Enum: "speaker" "word" "sentence"
group_channels_threshold_ms
integer or null [ 0 .. 5000 ]
Default: 100

Defines the maximum time delay (in milliseconds) allowed between tokens of different speakers to prevent splitting tokens of the current speaker. This parameter is used alongside group_channels_by to handle interruptions. For example:

  • When set to a low value, interruptions from other speakers are more likely to split tokens into separate groups.
  • When set to a high value, interruptions are tolerated longer, allowing ongoing phrases or words to complete before splitting. The value ensures continuity within the specified grouping type, such as completing a word when grouping by word, or keeping a sentence intact when grouping by sentence.

Note: This parameter is ignored when group_channels_by is set to speaker, as tokens are always grouped by speaker without considering time delays.

header Parameters
Accept
string

MIME type specifying the transcription output format

Enum: "application/vnd.rev.transcript.v1.0+json" "text/plain"
Responses
200

Rev AI API Transcript


Note: Transcript output format is required in the Accept header. Output can either be in Rev's JSON format or plaintext.

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

406

Invalid Transcript Format

409

Conflict

get/jobs/{id}/transcript
Request samples
curl -X GET "https://api.rev.ai/speechtotext/v1/jobs/{id}/transcript" -H "Authorization: Bearer $REV_ACCESS_TOKEN" -H "Accept: application/vnd.rev.transcript.v1.0+json"
Response samples
{
  • "monologues": [
    ]
}

Get Translated Transcript By Id

Returns translated transcript for a completed transcription job. Translation must be requested as part of the submitted job. Transcript can be returned in either JSON or plaintext format. Transcript output format can be specified in the Accept header. Returns JSON by default.

SecurityAccessToken
Request
path Parameters
id
required
string

Rev AI API Job Id

language
required
string

When requesting translated transcript, it is important to specify the language code that corresponds to your desired translation language. This language code should be one of the target languages you previously defined in your job submission.

Enum: "en" "en-us" "en-gb" "ar" "pt" "pt-br" "pt-pt" "fr" "fr-ca" "es" "es-es" "es-la" "it" "ja" "ko" "de" "ru"
Example: en
header Parameters
Accept
string

MIME type specifying the transcription output format

Enum: "application/vnd.rev.transcript.v1.0+json" "text/plain"
Responses
200

Rev AI API Transcript


Note: Transcript output format is required in the Accept header. Output can either be in Rev's JSON format or plaintext.

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

406

Invalid Transcript Format

409

Conflict

get/jobs/{id}/transcript/translation/{languageId}
Request samples
curl -X GET "https://api.rev.ai/speechtotext/v1/jobs/{id}/transcript/translation/{language}" -H "Authorization: Bearer $REV_ACCESS_TOKEN" -H "Accept: application/vnd.rev.transcript.v1.0+json"
Response samples
{
  • "monologues": [
    ]
}

Get Transcript Summary By Id

Returns the transcript summary for a completed transcription job. Summary can be returned as either JSON or plaintext format. Summary output format can be specified in the Accept header. Returns plaintext by default.

SecurityAccessToken
Request
path Parameters
id
required
string

Rev AI API Job Id

header Parameters
Accept
string
Default: text/plain

MIME type specifying summary output format

Enum: "text/plain" "application/json"
Responses
200

Transcript summary.

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

409

Conflict

get/jobs/{id}/transcript/summary
Request samples
curl -X GET "https://api.rev.ai/speechtotext/v1/jobs/{id}/transcript/summary" -H "Authorization: Bearer $REV_ACCESS_TOKEN" -H "Accept: text/plain"
Response samples
No sample

Captions

Get Captions

Returns the caption output for a transcription job. We currently support SubRip (SRT) and Web Video Text Tracks (VTT) output. Caption output format can be specified in the Accept header. Returns SRT by default.


Note: For streaming jobs, transient failure of our storage during a live session may prevent the final hypothesis elements from saving properly, resulting in an incomplete caption file. This is rare, but not impossible.

SecurityAccessToken
Request
path Parameters
id
required
string

Rev AI API Job Id

query Parameters
speaker_channel
integer

Identifies which channel of the job output to caption. Default is null which works only for jobs with no speaker_channels_count provided during job submission.

header Parameters
Accept
string

MIME type specifying the caption output format

Enum: "application/x-subrip" "text/vtt"
Responses
200

Rev AI API Captions


Note: Caption output format is required in the Accept header. The supported headers are application/x-subrip and text/vtt. (SRT)

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

405

Invalid Job Property

406

Invalid Caption Format

409

Conflict

get/jobs/{id}/captions
Request samples
curl -X GET "https://api.rev.ai/speechtotext/v1/jobs/{id}/captions" -H "Authorization: Bearer $REV_ACCESS_TOKEN" -H "Accept: application/x-subrip"
Response samples
1
00:00:01,210 --> 00:00:04,840
Hello there, this is a example captions output

2
00:00:07,350 --> 00:00:10,970
Each caption group is in the SubRip Text
file format

Get Translated Captions

Returns translated caption output for a transcription job. Translation must be requested as part of the submited job. We currently support SubRip (SRT) and Web Video Text Tracks (VTT) output. Caption output format can be specified in the Accept header. Returns SRT by default.

SecurityAccessToken
Request
path Parameters
id
required
string

Rev AI API Job Id

language
required
string

When requesting translated captions, it is important to specify the language code that corresponds to your desired translation language. This language code should be one of the target languages you previously defined in your job submission.

Enum: "en" "en-us" "en-gb" "ar" "pt" "pt-br" "pt-pt" "fr" "fr-ca" "es" "es-es" "es-la" "it" "ja" "ko" "de" "ru"
Example: en
header Parameters
Accept
string

MIME type specifying the caption output format

Enum: "application/x-subrip" "text/vtt"
Responses
200

Rev AI API Captions


Note: Caption output format is required in the Accept header. The supported headers are application/x-subrip and text/vtt. (SRT)

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

405

Invalid Job Property

406

Invalid Caption Format

409

Conflict

get/jobs/{id}/captions/translation/{languageId}
Request samples
curl -X GET "https://api.rev.ai/speechtotext/v1/jobs/{id}/captions" -H "Authorization: Bearer $REV_ACCESS_TOKEN" -H "Accept: application/x-subrip"
Response samples
1
00:00:01,210 --> 00:00:04,840
Hello there, this is an example captions output

2
00:00:07,350 --> 00:00:10,970
Each caption group is in the SubRip Text
file format

Accounts

Get Account

Get the developer's account information

SecurityAccessToken
Responses
200

Rev AI Account

401

Request Unauthorized

get/account
Request samples
curl -X GET "https://api.rev.ai/speechtotext/v1/account" -H "Authorization: Bearer $REV_ACCESS_TOKEN"
Response samples
application/json
{
  • "email": "example@rev.ai",
  • "free_balance": 5.5,
  • "purchased_balance": 8.5,
  • "total_balance": 14,
  • "invoiced_balance": -9.5,
  • "balance_seconds": 0,
  • "hipaa_enabled": true
}