The Custom Vocabulary API improves the accuracy of ASR transcription in both streaming and asynchronous modes.
The base URL for this version of the API is
https://api.rev.ai/speechtotext/v1. All endpoints described in this documentation are relative to this base URL.
Clients must authenticate by including their Rev AI access token in the
Authorization: header of their requests. If the access token is invalid or the header is not present, a
401 error code will be returned.
The following default limit applies per user for the Custom Vocabulary API:
- 150 custom vocabulary requests submitted every 2 minutes.
- Up to 6000 phrases may be submitted per transcription job for English, and up to 1000 for other languages. Refer to the general rules for more information .
The API indicates failure with
5xx HTTP status codes.
4xx status codes indicate an error due to the request provided (for example, a required parameter was omitted).
5xx error indicate an error with Rev AI's servers.
The following table lists common
4xx error codes and troubleshooting suggestions for each:
|400||Invalid Request||Verify that the request parameter names and values are valid.|
|401||Authorization Denied||Verify that the access token is valid.|
|403||Access Denied to Deployment||Verify that the account has permission to access the API endpoint for the selected deployment or region.|
|404||Resource Not Found / Unsupported Deployment||Verify that the identifier is valid and that the API endpoint is valid for the selected deployment or region.|
|409||Invalid Job State||Verify that the job has completed processing.|
4xx error occurs during invocation of a request, the API responds with a problem details HTTP response payload.
Some errors can be resolved simply by retrying the request. The following error codes are likely to be resolved with successive retries.
|409||Invalid Job State|
|429||Too Many Requests|
With the exception of the
429 status code, it is recommended that the maximum number of retries be limited to 5 attempts per request. The number of retries can be higher for
429 errors but if you notice consistent throttling, please contact the support team at email@example.com.
The problem details information is represented as a JSON object with the following optional properties:
||A URI representing the type for the error|
||A short human readable description of type|
||Additional details of the error|
||HTTP status code of the error|
In addition to the properties listed above, the problem details object may list additional properties that help to troubleshoot the problem.
Custom vocabularies are submitted as a list of phrases. A phrase can be one word or multiple words, usually describing a single object or concept.
- Up to 6000 phrases may be submitted per transcription job for English, and up to 1000 for other languages.
- Phrases must contain at least one alphabetic character from the respective language.
- Phrases cannot be longer than 12 words.
- Individual words cannot be longer than 34 characters.
- For English, non-numeric characters in the Basic Latin set are allowed, i.e. (U+0000-U+002F and U+003A-U+007F). For other languages, most non-numeric characters in the language are allowed.
Non-alphabetic characters will be ignored during speech recognition, but will be favored in the output. For example, if you submit
Yahoo!as a custom vocabulary, speech recognition will favor outputting
Yahoo!when it recognizes 'yahoo' in the audio.
Rules for initialisms
Initialisms are abbreviations consisting of initial letters pronounced separately, like CPU. Submit your initialisms as custom vocabulary to improve their speech recognition.
The following rules apply:
- Initialisms have to contain at least 3 letters
- An initialism will be recognized only when pronounced letter by letter
&are supported and will be treated as a letter pronounced like
Initialisms will be recognized when submitted in the following formats only:
Rules for non-alphabetic characters
The following rules apply for specific characters:
Numbers are not allowed, but note that some common terms like
401kare recognized by default and as such do not need to be added as custom vocabulary
Standalone ampersands in phrases will be treated like the word
Dashes in words will be ignored, for example
this-and-thatwill be treated as a single word roughly pronounced like
this and that