Get Started with Speech Recognition in Python
By Vikram Vaswani, Developer Advocate - September 27, 2022
Introduction
Rev AI's speech-to-text APIs power automatic speech recognition in thousands of applications and services. To make it easier for developers to integrate these APIs into their applications, Rev AI also offers SDKs for many programing languages...including the topic of this tutorial, Python.
In this tutorial, I'll introduce you to the basics of using Rev AI's Asynchronous Speech-to-Text API using Python and the Rev AI Python SDK. If you've ever wondered how to integrate speech recognition capabilities with your Python application, this tutorial will give you all the information you need to get started.
Assumptions
This tutorial assumes that:
- You have a Rev AI account and access token. If not, sign up for a free account and generate an access token .
- You have a properly-configured Python development environment with Python 3.x. If not, download and install Python for your operating system.
- You have installed pip, the Python dependency manager. If not, download and install pip for your operating system.
- You have an audio file to transcribe. If not, use this example audio file from Rev AI .
Step 1: Install the SDK
This tutorial will use the Rev AI Python SDK to submit transcription requests to the Rev AI Asynchronous Speech-to-Text API.
Begin by installing the SDK with pip:
pip install --upgrade rev_ai
Within your application code, initialize the Rev AI API client as below. Replace the <REVAI_ACCESS_TOKEN>
placeholder with your Rev AI access token:
from rev_ai import apiclient
# configure access token
token = "<REVAI_ACCESS_TOKEN>"
# initialize Rev AI API client
client = apiclient.RevAiAPIClient(token)
Here, the Rev AI API client is automatically initialized with the base endpoint for the Asynchronous Speech-to-Text API, which is https://api.rev.ai/speechtotext/v1/
.
Every request to the API must be in JSON format and must include an Authorization
header containing the API access token. The Rev AI Python SDK automatically takes care of attaching this required header to all its client requests.
Step 2: Submit a file for transcription
To generate a transcript from an audio file, you must submit an HTTP POST request to the API endpoint at https://api.rev.ai/speechtotext/v1/jobs
. The Rev AI Python SDK simplifies this process with two methods: submit_job_local_file()
and submit_job_url()
, for local and remote files respectively.
The following example demonstrates how to submit a local audio file for transcription.
To use this example, replace the <FILEPATH>
placeholder with the path to the file you wish to transcribe and the <REVAI_ACCESS_TOKEN>
placeholder with your Rev AI account's access token.
from rev_ai import apiclient
# configure access token and audio source
token = "<REVAI_ACCESS_TOKEN>"
filepath = "<FILEPATH>"
# initialize Rev AI API client
client = apiclient.RevAiAPIClient(token)
# submit a file for transcription
job = client.submit_job_local_file(filepath)
# get job id
job_id = job.id
print("Job submitted with id: " + job_id)
To run this example, save it as a file, such as example.py
and then execute python example.py
.
In this example, the API client internally makes a POST request to the API, passing it the audio file to be transcribed. The response body is then received and converted into a Python object.
Here is an example of the API response, represented as a Python object:
{'callback_url': (None,),
'completed_on': None,
'created_on': '2022-09-14T14:43:35.46Z',
'custom_vocabulary_id': None,
'delete_after_seconds': None,
'duration_seconds': None,
'failure': None,
'failure_detail': None,
'filter_profanity': None,
'id': 'xsDRpD6ladtf',
'language': 'en',
'media_url': None,
'metadata': None,
'name': 'myfile.mp3',
'remove_disfluencies': None,
'rush': None,
'segments_to_transcribe': None,
'skip_diarization': None,
'skip_punctuation': None,
'speaker_channels_count': None,
'status': <JobStatus.IN_PROGRESS: 1>,
'transcriber': None,
'verbatim': None}
The API response contains a job identifier (id
field). This job identifier will be required to check the job status and obtain the job result.
It is also possible to use a remote audio file, as shown in the following example:
from rev_ai import apiclient
# configure access token and audio source
token = "<REVAI_ACCESS_TOKEN>"
url = "<URL>"
# initialize Rev AI API client
client = apiclient.RevAiAPIClient(token)
# submit a file for transcription
job = client.submit_job_url(url)
# get job id
job_id = job.id
print("Job submitted with id: " + job_id)
Step 3: Check transcription status
To check the status of the transcription job, you must submit an HTTP GET request to the API endpoint at https://api.rev.ai/speechtotext/v1/jobs/<ID>
, where <ID>
is a placeholder for the job identifier. Again, the Rev AI Python SDK makes this easy with its get_job_details()
method, which accepts a job identifier as input and returns the current status of the job as a Python object.
The following example demonstrates how to check the status of an asynchronous transcription job.
To use this example, replace the <ID>
placeholder with the job identifier and the <REVAI_ACCESS_TOKEN>
placeholder with your Rev AI account's access token.
from rev_ai import apiclient
# configure access token and job identifier
token = "<REVAI_ACCESS_TOKEN>"
job_id = "<ID>"
# initialize Rev AI API client
client = apiclient.RevAiAPIClient(token)
# check job status
status = client.get_job_details(job_id)
# print response object
print(vars(status))
Here is an example of the response object received after the job has completed:
{'callback_url': (None,),
'completed_on': '2022-09-14T14:44:09.774Z',
'created_on': '2022-09-14T14:43:35.46Z',
'custom_vocabulary_id': None,
'delete_after_seconds': None,
'duration_seconds': 107.0,
'failure': None,
'failure_detail': None,
'filter_profanity': None,
'id': 'xsDRpD6ladtf',
'language': 'en',
'media_url': None,
'metadata': None,
'name': 'myfile.mp3',
'remove_disfluencies': None,
'rush': None,
'segments_to_transcribe': None,
'skip_diarization': None,
'skip_punctuation': None,
'speaker_channels_count': None,
'status': <JobStatus.TRANSCRIBED: 2>,
'transcriber': None,
'verbatim': None}
Step 4: Retrieve the transcript
Once the job's status
changes to TRANSCRIBED
, you can retrieve the results by submitting an HTTP GET request to the API endpoint at https://api.rev.ai/speechtotext/v1/jobs/<ID>/result
, where <ID>
is a placeholder for the job identifier. The Rev AI Python SDK offers three methods for this: get_transcript_text()
, get_transcript_json()
and get_transcript_object()
, which return the transcript as plaintext, JSON and a Python object respectively.
The following example demonstrates how to retrieve the results of an asynchronous transcription job.
To use this example, replace the <ID>
placeholder with the job identifier and the <REVAI_ACCESS_TOKEN>
placeholder with your Rev AI account's access token.
from rev_ai import apiclient
# configure access token and job identifier
token = "<REVAI_ACCESS_TOKEN>"
job_id = "<ID>"
# initialize Rev AI API client
client = apiclient.RevAiAPIClient(token)
# get transcript
transcript = client.get_transcript_json(job_id)
# print transcript
print(transcript)
Here is an example of the transcript returned from a successful job, represented as JSON:
{
"monologues": [
{
"speaker": 0,
"elements": [
{
"type": "text",
"value": "Hi",
"ts": 0.17,
"end_ts": 0.52,
"confidence": 1
},
{
"type": "punct",
"value": ","
},
{
"type": "punct",
"value": " "
},
{
"type": "text",
"value": "my",
"ts": 0.52,
"end_ts": 0.76,
"confidence": 1
},
...
]
},
...
]
}
Step 5: Create and test a simple application
Using the code samples shown previously, it's possible to create a simple application that accepts an audio file URL and returns a transcript, as shown below:
from rev_ai import apiclient
from time import sleep
def main(token, url):
# initialize Rev AI API client
client = apiclient.RevAiAPIClient(token)
# submit a file for transcription
job = client.submit_job_url(url)
# get job id
job_id = job.id
print("Job submitted with id: " + job_id)
# check job status
while (job.status.name == 'IN_PROGRESS'):
details = client.get_job_details(job_id)
print("Job status: " + details.status.name)
# if successful, print result
if (details.status.name == 'TRANSCRIBED'):
print(client.get_transcript_json(job_id))
break
# if unsuccessful, print error
if (details.status.name == 'FAILED'):
print("Job failed: " + details.failure_detail)
break
sleep(30)
token = "<REVAI_ACCESS_TOKEN>"
url = "<URL>"
main(token, url)
This example application begins by initializing an instance of the RevAiAPIClient
object, passing the Rev AI access token to the object constructor. It then submits a remote file for transcription using the object's submit_job_url()
method. It then uses the get_job_details()
method to repeatedly poll the API every 30 seconds to obtain the status of the job. Once the job status is no longer IN_PROGRESS
, it uses the get_transcript_json()
method to retrieve the transcript and prints it to the console.
Here is an example of the output generated by the example application:
Job submitted with id: XyHxoqX5cH5A
Job status: IN_PROGRESS
Job status: IN_PROGRESS
Job status: TRANSCRIBED
{'monologues': [{'speaker': 0, 'elements': [{'type': 'text', 'value': 'Hi', 'ts': 0.17, 'end_ts': 0.52, 'confidence': 1.0}, {'type': 'punct', 'value': ','}, ...]}, ..., ]}
warning
The example above polls the API repeatedly to check the status of the transcription job. This is presented only for illustrative purposes and is strongly recommended against in production scenarios. For production scenarios, use webhooks to asynchronously receive notifications once the job completes.
Next steps
Learn more about the topics discussed in this tutorial by visiting the following links:
- Documentation: Asynchronous Speech-To-Text API job submission
- Documentation: Python SDK
- Documentation: Asynchronous Speech-To-Text API best practices
- Code samples: Asynchronous Speech-To-Text API and Python SDK
- Tutorial: Get Started with Rev AI API Webhooks