Jobs

Get Job By Id

Returns information about a transcription job

SecurityAccessToken

Request

path Parameters

required

string

Rev AI API Job Id

Responses

200

Transcription Job Details

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

get/jobs/{id}

Request samples

curl -X GET "https://api.rev.ai/speechtotext/v1/jobs/{id}" -H "Authorization: Bearer $REV_ACCESS_TOKEN"

Response samples

application/json

{"id": "Umx5c6F7pH7r",
"status": "in_progress",
"language": "en",
"created_on": "2018-05-05T23:23:22.29Z",
"transcriber": "machine"
}

Delete Job by Id

Deletes a transcription job. All data related to the job, such as input media and transcript, will be permanently deleted. A job can only be deleted once it's completed (either with success or failure).

SecurityAccessToken

Request

path Parameters

required

string

Rev AI API Job Id

Responses

204

Job was successfully deleted

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

409

Conflict

delete/jobs/{id}

Request samples

curl -X DELETE "https://api.rev.ai/speechtotext/v1/jobs/{id}" -H "Authorization: Bearer $REV_ACCESS_TOKEN"

Response samples

application/problem+json

{"title": "Authorization has been denied for this request",
"status": 401
}

Get List of Jobs

Gets a list of transcription jobs submitted within the last 30 days in reverse chronological order up to the provided limit number of jobs per call. Note: Jobs older than 30 days will not be listed. Pagination is supported via passing the last job id from a previous call into starting_after.

SecurityAccessToken

Request

query Parameters

limit	integer or null [ 0 .. 1000 ] Default: 100 Limits the number of jobs returned, default is 100, max is 1000
starting_after	string or null If specified, returns jobs submitted before the job with this id, exclusive (job with this id is not included)

Responses

200

List of Rev AI Transcription Jobs

400

Bad Request

401

Request Unauthorized

403

User does not have permission to access this deployment

get/jobs

Request samples

# Get list of jobs with a limit of 10 jobs
curl -X GET "https://api.rev.ai/speechtotext/v1/jobs?limit=10" -H "Authorization: Bearer $REV_ACCESS_TOKEN"

# Get list of jobs starting after (submitted before) job with id Umx5c6F7pH7r
curl -X GET "https://api.rev.ai/speechtotext/v1/jobs?starting_after=Umx5c6F7pH7r" -H "Authorization: Bearer $REV_ACCESS_TOKEN"

Response samples

application/json

[{"id": "Umx5c6F7pH7r",
"status": "in_progress",
"created_on": "2018-05-05T23:23:22.29Z",
"type": "async",
"delete_after_seconds": 50,
"transcriber": "machine"
}
]

Submit Transcription Job

Starts an asynchronous job to transcribe speech-to-text for a media file. Media files can be specified in two ways, either by including a public url to the media in the transcription job options or by uploading a local file as part of a multipart/form request.

SecurityAccessToken

Request

Request Body schema:
application/json

Transcription Job Options

media_url

string <= 2048 characters

Deprecated

[HIPAA Unsupported] Deprecated. Use source_config instead. Direct download media url. Ignored if submitting job from file. **Note:**Most languages support media files with duration up to 17 hours, with the exception for Telugu (te) which has a limit of 6 hours. For non-English jobs, expected turnaround time can be up to 6 hours. If this parameter is used to pass in the media url, the media url will be visible in the response. It is recommended to use the source_config parameter instead, as authorization headers can be included and both the media url and auth headers will be encrypted when stored.

object or null

Optional authorization headers, if they are needed to access the resource at the URL. Headers could be a single Authorization header of the form <scheme> <token>, and one of the AWS signature v4 headers. Only one of source_config and media_url may be set. This option will not be visible in the submission response.

metadata

string or null <= 512 characters

Optional metadata that was provided during job submission.

callback_url

string or null <= 1024 characters

Deprecated

Deprecated. Use notification_config instead. Optional callback url to invoke when processing is complete. If this parameter is used to pass in the callback url, the callback url will be visible in the response. It is recommended to provide webhooks with the notification_config parameter as authorization headers can be included and both the callback url and auth headers will be encrypted when stored.

object or null

Optional configuration for a callback url to invoke when processing is complete, in addition to auth headers if they are needed to invoke the callback url. Cannot be set if callback_url is set. This option will not be visible in the submission response.

delete_after_seconds

integer or null [ 0 .. 2592000 ]

Amount of time after job completion when job is auto-deleted. Present only when preference set in job request.

transcriber

string or null

Default: "machine"

Select which service you would like to transcribe this file with.

Model	Description
`machine`	the default and routes to our standard (Reverb) model.
`human`	[HIPAA Unsupported] routes the file to our human transcribers.
`low_cost`	low-cost transcription which uses quantized ASR model (Reverb Turbo) with low-cost environment.
`fusion`	higher quality ASR that combines multiple models to achieve the best results. Typically has better support for rare words.

Enum: "machine" "human" "low_cost" "fusion"

verbatim

boolean

Configures the transcriber to transcribe every syllable. This will include all false starts and disfluencies in the transcript.

The behavior depends on the transcriber option:

Transcriber	Description
`machine`	the default is true. To turn it off false should be explicitly provided
`human`	the default is false To turn it on true should be explicitly provided

rush

boolean or null

Default: false

[HIPAA Unsupported] Only available for human transcriber option When this field is set to true your job is given higher priority and will be worked on sooner by our human transcribers.

test_mode

boolean or null

Default: false

[HIPAA Unsupported] Only available for human transcriber option When this field is set to true the behavior will mock a normal human transcription job except no transcription will happen. The primary use case is to test integrations without being charged for human transcription.

Array of objects or null

[HIPAA Unsupported] Only available for human transcriber option. Use this option to specify which sections of the transcript need to be transcribed. Segments must be at least 1 minute in length and cannot overlap.

Array of objects or null [ 0 .. 100 ] items

[HIPAA Unsupported] Only available for human transcriber option. Use this option to specify up to 100 names of speakers in the transcript. Names may only be up to 50 characters long.

skip_diarization

boolean or null

Default: false

Specify if speaker diarization will be skipped by the speech engine

skip_postprocessing

boolean or null

Default: false

Only available for English and Spanish languages. User-supplied preference on whether to skip post-processing operations such as inverse text normalization (ITN), casing and punctuation.

skip_punctuation

boolean or null

Default: false

Specify if "punct" type elements will be skipped by the speech engine. For JSON outputs, this includes removing spaces. For text outputs, words will still be delimited by a space

remove_disfluencies

boolean or null

Default: false

Currently we only define disfluencies as 'ums' and 'uhs'. When set to true, disfluencies will not appear in the transcript. This option also removes atmospherics if the remove_atmospherics is not set. This option is not available for human transcription jobs.

remove_atmospherics

boolean or null

Default: false

We define many atmospherics such <laugh>, <affirmative> etc. When set to true, atmospherics will not appear in the transcript. This option is not available for human transcription jobs.

filter_profanity

boolean or null

Default: false

Enabling this option will filter for approx. 600 profanities, which cover most use cases. If a transcribed word matches a word on this list, then all the characters of that word will be replaced by asterisks except for the first and last character.

speaker_channels_count

integer or null [ 1 .. 8 ]

Only available for English, Spanish and French languages. Use to specify the total number of unique speaker channels in the audio.

Given the number of audio channels provided, each channel will be transcribed separately and the channel id assigned to the speaker label. The final output will be a combination of all individual channel outputs. Overlapping monologues will have ordering broken by the order in which the first spoken element of each monologue occurs. If speaker_channels_count is greater than the actual channels in the audio, the job will fail with invalid_media. This option is not available for human transcription jobs.

Best practice:

If you have speakers recorded on individual audio channels in the same file (example: Interviewer on channel 1 (left channel) and Applicant on channel 2 (right channel) then use speaker_channels_count. There are extra costs to this approach because the channels are transcribed separately. Benefits: perfect diarization, higher accuracy transcription during periods of long cross-talk (speakers talking over one another).
If you have generic audio with an unknown number of speakers, do not specify speakers_count or speaker_channels_count

Note:

The amount charged will be the duration of the file multiplied by the number of channels specified.
When using speaker_channels_count each channel will be diarized as one speaker, and the value of skip_diarization will be ignored if provided

speakers_count

integer or null >= 1

Default: null

Only available for English, Spanish and French languages. Use to specify the total number of unique speakers in the audio.

Given the count of speakers provided, it will be used to improve the diarization accuracy. This option is not available for human transcription jobs.

Best practice:

If you have an audio with multiple speakers and the number of speakers is definitively known, use speakers_count. This will provide a hint to the speech engine to make sure the correct number of speakers are idenitified.
If you have generic audio with an unknown number of speakers, do not specify speakers_count or speaker_channels_count

Note:

When using speaker_channels_count each channel will be diarized based on the speakers count provided.
When using skip_diarization the speakers count will be ignored if provided.

diarization_type

string or null

Default: "standard"

Use to specify diarization type. This option is not available for human transcription jobs and low-cost environment.

Enum: "standard" "premium"

custom_vocabulary_id

string or null

This feature is in beta. You can supply the id of a pre-completed custom vocabulary that you submitted through the Custom Vocabularies API instead of uploading the list of phrases using the custom_vocabularies parameter. Using custom_vocabulary_id or custom_vocabularies with the same list of phrases yields the same transcription result, but custom_vocabulary_id enables your submission to finish processing faster by 6 seconds on average.

You cannot use both custom_vocabulary_id and custom_vocabularies at the same time, and doing so will result in a 400 response. If the supplied id represents an incomplete, deleted, or non-existent custom vocabulary then you will receive a 404 response.

Array of objects [ 1 .. 50 ] items

Specify a collection of custom vocabulary to be used for this job. Custom vocabulary informs and biases the speech recognition to find those phrases (at the cost of slightly slower transcription).

strict_custom_vocabulary

boolean

If true, only exact phrases will be used as custom vocabulary, i.e. phrases will not be split into individual words for processing. By default is enabled.

object or null

Use to specify summarization options. This option is not available for human transcription jobs.

object or null

Use to specify translation options. This option is not available for human transcription jobs.

language

string or null

Default: "en"

language is provided as a ISO 639-1 language code, with the following exceptions:

Multilingual English/Spanish (en/es), which does not follow any existing ISO language code convention
English US (en-us), which is an ISO 639-1 language code with region United States
English UK (en-gb), which is an ISO 639-1 language code with region United Kingdom
Mandarin (cmn), which is supplied as an ISO 639-3 language code

Only 1 language can be selected per audio, i.e. no multiple languages in one transcription job. Additionally, the following parameters may not be used with non-English languages: skip_punctuation, remove_disfluencies, filter_profanity, speaker_channels_count, custom_vocabulary_id.

You can provide a language parameter for transcribing audio in one of the following languages:

Language	Language Code	HIPAA Support (US)	EU Deployment Support
Multilingual English/Spanish	en/es	x
Afrikaans	af	x
Arabic	ar	x	x
Armenian	hy	x
Azerbaijani	az	x
Belarusian	be	x
Bosnian	bs	x
Bulgarian	bg	x	x
Catalan	ca	x	x
Croatian	hr	x	x
Czech	cs	x	x
Danish	da	x	x
Dutch	nl	x	x
English	en	x	x
English (UK)	en-gb	x	x
English (US)	en-us	x	x
Estonian	et	x
Farsi	fa	x	x
Finnish	fi	x	x
French	fr	x	x
Galician	gl	x
German	de	x	x
Greek	el	x	x
Hebrew	he	x	x
Hindi	hi	x	x
Hungarian	hu	x	x
Icelandic	is	x
Indonesian	id	x	x
Italian	it	x	x
Japanese	ja	x	x
Kannada	kn	x
Kazakh	kk	x
Korean	ko	x	x
Latvian	lv	x	x
Lithuanian	lt	x	x
Macedonian	mk	x
Malay	ms	x	x
Mandarin	cmn	x	x
Marathi	mr	x
Nepali	ne	x
Norwegian	no	x	x
Polish	pl	x	x
Portuguese	pt	x	x
Romanian	ro	x	x
Russian	ru	x	x
Serbian	sr	x
Slovak	sk	x	x
Slovenian	sl	x	x
Spanish	es	x	x
Swahili	sw	x
Swedish	sv	x	x
Tagalog	tl	x
Tamil	ta	x	x
Telugu	te	x	x
Thai	th	x
Turkish	tr	x	x
Ukrainian	uk	x
Urdu	ur	x
Vietnamese	vi	x
Welsh	cy	x

forced_alignment

boolean or null

Default: false

Provides improved accuracy for per-word timestamps for a transcript.

The following languages are currently supported:

English (en, en-us, en-gb)
French (fr)
Italian (it)
German (de)
Spanish (es)

This option is not available in low-cost environment

Responses

200

Transcription Job Details

400

Bad Request

401

Request Unauthorized

403

Request Forbidden

413

Payload Too Large

Only returned when job is submitted using a local file as part of multipart/form-data. Submit a job with the source_config parameter for files larger than 2GBs

post/jobs

Request samples

{"metadata": "example metadata",
"notification_config": {"url": "https://www.example.com/callback",
"auth_headers": {"Authorization": "Bearer <notification-url-token>"
}
},
"source_config": {"url": "https://www.rev.ai/FTC_Sample_1.mp3",
"auth_headers": {"Authorization": "Bearer <source-url-token>"
}
},
"transcriber": "machine",
"skip_diarization": false,
"skip_punctuation": false,
"skip_postprocessing": false,
"remove_disfluencies": false,
"filter_profanity": false,
"speaker_channel_count": 1,
"delete_after_seconds": 2592000,
"custom_vocabulary_id": null,
"language": "en"
}

Response samples

application/json

{"id": "Umx5c6F7pH7r",
"status": "in_progress",
"language": "en",
"created_on": "2018-05-05T23:23:22.29Z",
"transcriber": "machine"
}

Transcripts

Get Transcript By Id

Returns the transcript for a completed transcription job. Transcript can be returned as either JSON or plaintext format. Transcript output format can be specified in the Accept header. Returns JSON by default.

Note: For streaming jobs, transient failure of our storage during a live session may prevent the final hypothesis elements from saving properly, resulting in an incomplete transcript. This is rare, but not impossible. To guarantee 100% completeness, we recommend capturing all final hypothesis when you receive them on the client.

SecurityAccessToken

Request

path Parameters

required

string

Rev AI API Job Id

query Parameters

group_channels_by

string or null

Default: "word"

Specifies the grouping strategy for organizing tokens in the output transcript for multichannel inputs (speaker_channels_count > 1). The parameter determines the granularity of grouping based on speaker changes or interruptions. Allowed values are:

speaker: Groups all tokens spoken by the same speaker, preserving continuity in monologues.
sentence: Groups tokens at the sentence level. Sentences remain intact even when interrupted by another speaker.
word: Groups tokens at the word level. In cases of interruptions, monologues are split per word, ensuring no overlap. This parameter works together with group_channels_threshold_ms to define how interruptions between speakers are handled.

Enum: "speaker" "word" "sentence"

group_channels_threshold_ms

integer or null [ 0 .. 5000 ]

Default: 100

Defines the maximum time delay (in milliseconds) allowed between tokens of different speakers to prevent splitting tokens of the current speaker. This parameter is used alongside group_channels_by to handle interruptions. For example:

When set to a low value, interruptions from other speakers are more likely to split tokens into separate groups.
When set to a high value, interruptions are tolerated longer, allowing ongoing phrases or words to complete before splitting. The value ensures continuity within the specified grouping type, such as completing a word when grouping by word, or keeping a sentence intact when grouping by sentence.

Note: This parameter is ignored when group_channels_by is set to speaker, as tokens are always grouped by speaker without considering time delays.

header Parameters

string

MIME type specifying the transcription output format

Enum: "application/vnd.rev.transcript.v1.0+json" "text/plain"

Responses

200

Rev AI API Transcript

Note: Transcript output format is required in the Accept header. Output can either be in Rev's JSON format or plaintext.

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

406

Invalid Transcript Format

409

Conflict

get/jobs/{id}/transcript

Request samples

curl -X GET "https://api.rev.ai/speechtotext/v1/jobs/{id}/transcript" -H "Authorization: Bearer $REV_ACCESS_TOKEN" -H "Accept: application/vnd.rev.transcript.v1.0+json"

Response samples

{"monologues": [{"speaker": 1,
"elements": [{"type": "text",
"value": "Hello",
"ts": 0.5,
"end_ts": 1.5,
"confidence": 1
},
{"type": "punct",
"value": " "
},
{"type": "text",
"value": "World",
"ts": 1.75,
"end_ts": 2.85,
"confidence": 0.8
},
{"type": "punct",
"value": "."
}
]
},
{"speaker": 2,
"elements": [{"type": "text",
"value": "monologues",
"ts": 3,
"end_ts": 3.5,
"confidence": 1
},
{"type": "punct",
"value": " "
},
{"type": "text",
"value": "are",
"ts": 3.6,
"end_ts": 3.9,
"confidence": 1
},
{"type": "punct",
"value": " "
},
{"type": "text",
"value": "a",
"ts": 4,
"end_ts": 4.3,
"confidence": 1
},
{"type": "punct",
"value": " "
},
{"type": "text",
"value": "block",
"ts": 4.5,
"end_ts": 5.5,
"confidence": 1
},
{"type": "punct",
"value": " "
},
{"type": "text",
"value": "of",
"ts": 5.75,
"end_ts": 6.14,
"confidence": 1
},
{"type": "punct",
"value": " "
},
{"type": "unknown",
"value": "<inaudible>"
},
{"type": "punct",
"value": " "
},
{"type": "text",
"value": "text",
"ts": 6.5,
"end_ts": 7.78,
"confidence": 1
},
{"type": "punct",
"value": "."
}
]
}
]
}

Get Translated Transcript By Id

Returns translated transcript for a completed transcription job. Translation must be requested as part of the submitted job. Transcript can be returned in either JSON or plaintext format. Transcript output format can be specified in the Accept header. Returns JSON by default.

SecurityAccessToken

Request

path Parameters

id required	string Rev AI API Job Id
language required	string When requesting translated transcript, it is important to specify the language code that corresponds to your desired translation language. This language code should be one of the target languages you previously defined in your job submission. Enum: "en" "en-us" "en-gb" "ar" "pt" "pt-br" "pt-pt" "fr" "fr-ca" "es" "es-es" "es-la" "it" "ja" "ko" "de" "ru" Example: en

header Parameters

string

MIME type specifying the transcription output format

Enum: "application/vnd.rev.transcript.v1.0+json" "text/plain"

Responses

200

Rev AI API Transcript

Note: Transcript output format is required in the Accept header. Output can either be in Rev's JSON format or plaintext.

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

406

Invalid Transcript Format

409

Conflict

get/jobs/{id}/transcript/translation/{languageId}

Request samples

curl -X GET "https://api.rev.ai/speechtotext/v1/jobs/{id}/transcript/translation/{language}" -H "Authorization: Bearer $REV_ACCESS_TOKEN" -H "Accept: application/vnd.rev.transcript.v1.0+json"

Response samples

{"monologues": [{"speaker": 1,
"speaker_info": {"id": 1,
"display_name": "Jane Doe"
},
"elements": [{"type": "text",
"value": "Hello",
"ts": 0,
"end_ts": 0,
"confidence": 0.85
}
]
}
]
}

Get Transcript Summary By Id

Returns the transcript summary for a completed transcription job. Summary can be returned as either JSON or plaintext format. Summary output format can be specified in the Accept header. Returns plaintext by default.

SecurityAccessToken

Request

path Parameters

required

string

Rev AI API Job Id

header Parameters

string

Default: text/plain

MIME type specifying summary output format

Enum: "text/plain" "application/json"

Responses

200

Transcript summary.

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

409

Conflict

get/jobs/{id}/transcript/summary

Request samples

curl -X GET "https://api.rev.ai/speechtotext/v1/jobs/{id}/transcript/summary" -H "Authorization: Bearer $REV_ACCESS_TOKEN" -H "Accept: text/plain"

Response samples

No sample

Captions

Get Captions

Returns the caption output for a transcription job. We currently support SubRip (SRT) and Web Video Text Tracks (VTT) output. Caption output format can be specified in the Accept header. Returns SRT by default.

Note: For streaming jobs, transient failure of our storage during a live session may prevent the final hypothesis elements from saving properly, resulting in an incomplete caption file. This is rare, but not impossible.

SecurityAccessToken

Request

path Parameters

required

string

Rev AI API Job Id

query Parameters

speaker_channel

integer

Identifies which channel of the job output to caption. Default is null which works only for jobs with no speaker_channels_count provided during job submission.

header Parameters

string

MIME type specifying the caption output format

Enum: "application/x-subrip" "text/vtt"

Responses

200

Rev AI API Captions

Note: Caption output format is required in the Accept header. The supported headers are application/x-subrip and text/vtt. (SRT)

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

405

Invalid Job Property

406

Invalid Caption Format

409

Conflict

get/jobs/{id}/captions

Request samples

curl -X GET "https://api.rev.ai/speechtotext/v1/jobs/{id}/captions" -H "Authorization: Bearer $REV_ACCESS_TOKEN" -H "Accept: application/x-subrip"

Response samples

1
00:00:01,210 --> 00:00:04,840
Hello there, this is a example captions output

2
00:00:07,350 --> 00:00:10,970
Each caption group is in the SubRip Text
file format

Get Translated Captions

Returns translated caption output for a transcription job. Translation must be requested as part of the submited job. We currently support SubRip (SRT) and Web Video Text Tracks (VTT) output. Caption output format can be specified in the Accept header. Returns SRT by default.

SecurityAccessToken

Request

path Parameters

id required	string Rev AI API Job Id
language required	string When requesting translated captions, it is important to specify the language code that corresponds to your desired translation language. This language code should be one of the target languages you previously defined in your job submission. Enum: "en" "en-us" "en-gb" "ar" "pt" "pt-br" "pt-pt" "fr" "fr-ca" "es" "es-es" "es-la" "it" "ja" "ko" "de" "ru" Example: en

header Parameters

string

MIME type specifying the caption output format

Enum: "application/x-subrip" "text/vtt"

Responses

200

Rev AI API Captions

Note: Caption output format is required in the Accept header. The supported headers are application/x-subrip and text/vtt. (SRT)

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

405

Invalid Job Property

406

Invalid Caption Format

409

Conflict

get/jobs/{id}/captions/translation/{languageId}

Request samples

curl -X GET "https://api.rev.ai/speechtotext/v1/jobs/{id}/captions" -H "Authorization: Bearer $REV_ACCESS_TOKEN" -H "Accept: application/x-subrip"

Response samples

1
00:00:01,210 --> 00:00:04,840
Hello there, this is an example captions output

2
00:00:07,350 --> 00:00:10,970
Each caption group is in the SubRip Text
file format

Accounts

Get Account

Get the developer's account information

SecurityAccessToken

Responses

200

Rev AI Account

401

Request Unauthorized

get/account

Request samples

curl -X GET "https://api.rev.ai/speechtotext/v1/account" -H "Authorization: Bearer $REV_ACCESS_TOKEN"

Response samples

application/json

{"email": "example@rev.ai",
"free_balance": 5.5,
"purchased_balance": 8.5,
"total_balance": 14,
"invoiced_balance": -9.5,
"balance_seconds": 0,
"hipaa_enabled": true
}

Jobs

Get Job By Id

path Parameters

Delete Job by Id

path Parameters

Get List of Jobs

query Parameters

Submit Transcription Job

Request Body schema: application/jsonmultipart/form-dataapplication/json

Transcripts

Get Transcript By Id

path Parameters

query Parameters

header Parameters

Get Translated Transcript By Id

path Parameters

header Parameters

Get Transcript Summary By Id

path Parameters

header Parameters

Captions

Get Captions

path Parameters

query Parameters

header Parameters

Get Translated Captions

path Parameters

header Parameters

Accounts

Get Account

Request Body schema:
application/json