Jobs

Get Job By Id

Returns information about a transcription job

SecurityAccessToken
Request
path Parameters
id
required
string

Rev AI API Job Id

Responses
200

Transcription Job Details

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

get/jobs/{id}
Request samples
curl -X GET "https://api.rev.ai/speechtotext/v1/jobs/{id}" -H "Authorization: Bearer $REV_ACCESS_TOKEN"
Response samples
application/json
{
  • "id": "Umx5c6F7pH7r",
  • "status": "in_progress",
  • "language": "en",
  • "created_on": "2018-05-05T23:23:22.29Z",
  • "transcriber": "machine"
}

Delete Job by Id

Deletes a transcription job. All data related to the job, such as input media and transcript, will be permanently deleted. A job can only be deleted once it's completed (either with success or failure).

SecurityAccessToken
Request
path Parameters
id
required
string

Rev AI API Job Id

Responses
204

Job was successfully deleted

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

409

Conflict

delete/jobs/{id}
Request samples
curl -X DELETE "https://api.rev.ai/speechtotext/v1/jobs/{id}" -H "Authorization: Bearer $REV_ACCESS_TOKEN"
Response samples
application/problem+json
{
  • "title": "Authorization has been denied for this request",
  • "status": 401
}

Get List of Jobs

Gets a list of transcription jobs submitted within the last 30 days in reverse chronological order up to the provided limit number of jobs per call. Note: Jobs older than 30 days will not be listed. Pagination is supported via passing the last job id from a previous call into starting_after.

SecurityAccessToken
Request
query Parameters
limit
integer or null [ 0 .. 1000 ]
Default: 100

Limits the number of jobs returned, default is 100, max is 1000

starting_after
string or null

If specified, returns jobs submitted before the job with this id, exclusive (job with this id is not included)

Responses
200

List of Rev AI Transcription Jobs

400

Bad Request

401

Request Unauthorized

403

User does not have permission to access this deployment

get/jobs
Request samples
# Get list of jobs with a limit of 10 jobs
curl -X GET "https://api.rev.ai/speechtotext/v1/jobs?limit=10" -H "Authorization: Bearer $REV_ACCESS_TOKEN"

# Get list of jobs starting after (submitted before) job with id Umx5c6F7pH7r
curl -X GET "https://api.rev.ai/speechtotext/v1/jobs?starting_after=Umx5c6F7pH7r" -H "Authorization: Bearer $REV_ACCESS_TOKEN"
Response samples
application/json
[
  • {
    }
]

Submit Transcription Job

Starts an asynchronous job to transcribe speech-to-text for a media file. Media files can be specified in two ways, either by including a public url to the media in the transcription job options or by uploading a local file as part of a multipart/form request.

SecurityAccessToken
Request
Request Body schema:

Transcription Job Options

media_url
string <= 2048 characters
Deprecated

[HIPAA Unsupported] Deprecated. Use source_config instead. Direct download media url. Ignored if submitting job from file. Note: Media files longer than 17 hours are not supported for English transcription. Media files longer than 6 hours are not supported for non-English transcription with languages codes fa, he, id, ta and te. The other non-English language codes support media files with duration up to 12 hours. For non-English jobs, expected turnaround time can be up to 6 hours. If this parameter is used to pass in the media url, the media url will be visible in the response. It is recommended to use the source_config parameter instead, as authorization headers can be included and both the media url and auth headers will be encrypted when stored.

object or null

Optional authorization headers, if they are needed to access the resource at the URL. Headers must either be a single Authorization header of the form <scheme> <token> or (AWS signature v4 headers)[https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-auth-using-authorization-header.html] Only one of source_config and media_url may be set. This option will not be visible in the submission response.

metadata
string or null <= 512 characters

Optional metadata that was provided during job submission.

callback_url
string or null <= 1024 characters
Deprecated

Deprecated. Use notification_config instead. Optional callback url to invoke when processing is complete. If this parameter is used to pass in the callback url, the callback url will be visible in the response. It is recommended to provide webhooks with the notification_config parameter as authorization headers can be included and both the callback url and auth headers will be encrypted when stored.

object or null

Optional configuration for a callback url to invoke when processing is complete, in addition to auth headers if they are needed to invoke the callback url. Cannot be set if callback_url is set. This option will not be visible in the submission response.

delete_after_seconds
integer or null [ 0 .. 2592000 ]

Amount of time after job completion when job is auto-deleted. Present only when preference set in job request.

transcriber
string or null
Default: "machine"

Select which service you would like to transcribe this file with.

Model Description
machine the default and routes to our standard (v2) model
human [HIPAA Unsupported] routes the file to our human transcribers.
Enum: "machine" "human"
verbatim
boolean or null
Default: false

[HIPAA Unsupported] Only available for human transcriber option When this field is set to true the transcriber will transcribe every syllable. This will include all false starts, and disfluencies in the transcript.

rush
boolean or null
Default: false

[HIPAA Unsupported] Only available for human transcriber option When this field is set to true your job is given higher priority and will be worked on sooner by our human transcribers.

test_mode
boolean or null
Default: false

[HIPAA Unsupported] Only available for human transcriber option When this field is set to true the behavior will mock a normal human transcription job except no transcription will happen. The primary use case is to test integrations without being charged for human transcription.

Array of objects or null

[HIPAA Unsupported] Only available for human transcriber option. Use this option to specify which sections of the transcript need to be transcribed. Segments must be at least 1 minute in length and cannot overlap.

Array of objects or null [ 0 .. 100 ] items

[HIPAA Unsupported] Only available for human transcriber option. Use this option to specify up to 100 names of speakers in the transcript. Names may only be up to 50 characters long.

skip_diarization
boolean or null
Default: false

Specify if speaker diarization will be skipped by the speech engine

skip_postprocessing
boolean or null
Default: false

Only available for English and Spanish languages. User-supplied preference on whether to skip post-processing operations such as inverse text normalization (ITN), casing and punctuation.

skip_punctuation
boolean or null
Default: false

Specify if "punct" type elements will be skipped by the speech engine. For JSON outputs, this includes removing spaces. For text outputs, words will still be delimited by a space

remove_disfluencies
boolean or null
Default: false

Currently we only define disfluencies as 'ums' and 'uhs'. When set to true, disfluencies will not appear in the transcript.

filter_profanity
boolean or null
Default: false

Enabling this option will filter for approx. 600 profanities, which cover most use cases. If a transcribed word matches a word on this list, then all the characters of that word will be replaced by asterisks except for the first and last character.

speaker_channels_count
integer or null [ 1 .. 8 ]

Use to specify the total number of unique speaker channels in the audio.

Given the number of audio channels provided, each channel will be transcribed separately and the channel id assigned to the speaker label. The final output will be a combination of all individual channel outputs. Overlapping monologues will have ordering broken by the order in which the first spoken element of each monologue occurs. If speaker_channels_count is greater than the actual channels in the audio, the job will fail with invalid_media.

Note:

  • The amount charged will be the duration of the file multiplied by the number of channels specified.
  • When using speaker_channels_count each channel will be diarized as one speaker, and the value of skip_diarization will be ignored if provided
custom_vocabulary_id
string or null

This feature is in beta. You can supply the id of a pre-completed custom vocabulary that you submitted through the Custom Vocabularies API instead of uploading the list of phrases using the custom_vocabularies parameter. Using custom_vocabulary_id or custom_vocabularies with the same list of phrases yields the same transcription result, but custom_vocabulary_id enables your submission to finish processing faster by 6 seconds on average.

You cannot use both custom_vocabulary_id and custom_vocabularies at the same time, and doing so will result in a 400 response. If the supplied id represents an incomplete, deleted, or non-existent custom vocabulary then you will receive a 404 response.

Array of objects [ 1 .. 50 ] items

Specify a collection of custom vocabulary to be used for this job. Custom vocabulary informs and biases the speech recognition to find those phrases (at the cost of slightly slower transcription).

language
string or null
Default: "en"

language is provided as a ISO 639-1 language code, with the exception of Mandarin (cmn) which is supplied as an ISO 639-3 language code.

Only 1 language can be selected per audio, i.e. no multiple languages in one transcription job. Additionally, the following parameters may not be used with non-English languages: skip_punctuation, remove_disfluencies, filter_profanity, speaker_channels_count, custom_vocabulary_id.

You can provide a language parameter for transcribing audio in one of the following languages:

Language Language Code HIPAA Support EU Global Deployment Support
Arabic ar x
Bulgarian bg
Catalan ca
Croatian hr
Czech cs
Danish da x
Dutch nl x
English en x x
Farsi fa x
Finnish fi
French fr x
German de x
Greek el
Hebrew he x
Hindi hi x
Hungarian hu
Indonesian id x
Italian it x
Japanese ja x
Korean ko x
Lithuanian lt
Latvian lv
Malay ms x
Mandarin cmn x
Norwegian no
Polish pl
Portuguese pt x
Romanian ro
Russian ru x
Slovak sk
Slovenian sl
Spanish es x
Swedish sv
Tamil ta x
Telugu te x
Turkish tr x
Responses
200

Transcription Job Details

400

Bad Request

401

Request Unauthorized

403

Request Forbidden

413

Payload Too Large


Only returned when job is submitted using a local file as part of multipart/form-data. Submit a job with the source_config parameter for files larger than 2GBs

post/jobs
Request samples
{
  • "metadata": "example metadata",
  • "notification_config": {},
  • "source_config": {},
  • "transcriber": "machine",
  • "skip_diarization": false,
  • "skip_punctuation": false,
  • "skip_postprocessing": false,
  • "remove_disfluencies": false,
  • "filter_profanity": false,
  • "speaker_channel_count": 1,
  • "delete_after_seconds": 2592000,
  • "custom_vocabulary_id": null,
  • "language": "en"
}
Response samples
application/json
{
  • "id": "Umx5c6F7pH7r",
  • "status": "in_progress",
  • "language": "en",
  • "created_on": "2018-05-05T23:23:22.29Z",
  • "transcriber": "machine"
}

Transcripts

Get Transcript By Id

Returns the transcript for a completed transcription job. Transcript can be returned as either JSON or plaintext format. Transcript output format can be specified in the Accept header. Returns JSON by default.


Note: For streaming jobs, transient failure of our storage during a live session may prevent the final hypothesis elements from saving properly, resulting in an incomplete transcript. This is rare, but not impossible. To guarantee 100% completeness, we recommend capturing all final hypothesis when you receive them on the client.

SecurityAccessToken
Request
path Parameters
id
required
string

Rev AI API Job Id

header Parameters
Accept
string

MIME type specifying the transcription output format

Enum: "application/vnd.rev.transcript.v1.0+json" "text/plain"
Responses
200

Rev AI API Transcript


Note: Transcript output format is required in the Accept header. Output can either be in Rev's JSON format or plaintext.

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

406

Invalid Transcript Format

409

Conflict

get/jobs/{id}/transcript
Request samples
curl -X GET "https://api.rev.ai/speechtotext/v1/jobs/{id}/transcript" -H "Authorization: Bearer $REV_ACCESS_TOKEN" -H "Accept: application/vnd.rev.transcript.v1.0+json"
Response samples
{
  • "monologues": [
    ]
}

Captions

Get Captions

Returns the caption output for a transcription job. We currently support SubRip (SRT) and Web Video Text Tracks (VTT) output. Caption output format can be specified in the Accept header. Returns SRT by default.


Note: For streaming jobs, transient failure of our storage during a live session may prevent the final hypothesis elements from saving properly, resulting in an incomplete caption file. This is rare, but not impossible.

SecurityAccessToken
Request
path Parameters
id
required
string

Rev AI API Job Id

query Parameters
speaker_channel
integer

Identifies which channel of the job output to caption. Default is null which works only for jobs with no speaker_channels_count provided during job submission.

header Parameters
Accept
string

MIME type specifying the caption output format

Enum: "application/x-subrip" "text/vtt"
Responses
200

Rev AI API Captions


Note: Caption output format is required in the Accept header. The supported headers are application/x-subrip and text/vtt. (SRT)

401

Request Unauthorized

403

User does not have permission to access this deployment

404

Job Not Found

405

Invalid Job Property

406

Invalid Caption Format

409

Conflict

get/jobs/{id}/captions
Request samples
curl -X GET "https://api.rev.ai/speechtotext/v1/jobs/{id}/captions" -H "Authorization: Bearer $REV_ACCESS_TOKEN" -H "Accept: application/x-subrip"
Response samples
1
00:00:01,210 --> 00:00:04,840
Hello there, this is a example captions output

2
00:00:07,350 --> 00:00:10,970
Each caption group is in the SubRip Text
file format

Accounts

Get Account

Get the developer's account information

SecurityAccessToken
Responses
200

Rev AI Account

401

Request Unauthorized

get/account
Request samples
curl -X GET "https://api.rev.ai/speechtotext/v1/account" -H "Authorization: Bearer $REV_ACCESS_TOKEN"
Response samples
application/json
{
  • "email": "example@rev.ai",
  • "free_balance": 5.5,
  • "purchased_balance": 8.5,
  • "total_balance": 14,
  • "invoiced_balance": -9.5,
  • "balance_seconds": 0,
  • "hipaa_enabled": true
}