# Get Started with Speech Recognition in Python **By Vikram Vaswani, Developer Advocate - September 27, 2022** ## Introduction Rev AI's speech-to-text APIs power automatic speech recognition in thousands of applications and services. To make it easier for developers to integrate these APIs into their applications, Rev AI also offers [SDKs for many programing languages](/sdk)...including the topic of this tutorial, Python. In this tutorial, I'll introduce you to the basics of using Rev AI's [Asynchronous Speech-to-Text API](/api/asynchronous) using Python and the [Rev AI Python SDK](/sdk/python). If you've ever wondered how to integrate speech recognition capabilities with your Python application, this tutorial will give you all the information you need to get started. ## Assumptions This tutorial assumes that: - You have a Rev AI account and access token. If not, [sign up for a free account](https://www.rev.ai/auth/signup) and [generate an access token](/get-started#step-1-get-your-access-token). - You have a properly-configured Python development environment with Python 3.x. If not, [download and install Python](https://www.python.org/downloads/) for your operating system. - You have installed pip, the Python dependency manager. If not, [download and install pip](https://pip.pypa.io/en/stable/installation/) for your operating system. - You have an audio file to transcribe. If not, use this [example audio file from Rev AI](https://www.rev.ai/FTC_Sample_1.mp3). ## Step 1: Install the SDK This tutorial will use the [Rev AI Python SDK](/sdk/python) to submit transcription requests to the Rev AI [Asynchronous Speech-to-Text API](/api/asynchronous). Begin by installing the SDK with pip: ```bash pip install --upgrade rev_ai ``` Within your application code, initialize the Rev AI API client as below. Replace the `` placeholder with your Rev AI access token: ```python from rev_ai import apiclient # configure access token token = "" # initialize Rev AI API client client = apiclient.RevAiAPIClient(token) ``` Here, the Rev AI API client is automatically initialized with the base endpoint for the Asynchronous Speech-to-Text API, which is `https://api.rev.ai/speechtotext/v1/`. Every request to the API must be in JSON format and must include an `Authorization` header containing the API access token. The Rev AI Python SDK automatically takes care of attaching this required header to all its client requests. ## Step 2: Submit a file for transcription To generate a transcript from an audio file, you must submit an HTTP POST request to the API endpoint at `https://api.rev.ai/speechtotext/v1/jobs`. The Rev AI Python SDK simplifies this process with two methods: `submit_job_local_file()` and `submit_job_url()`, for local and remote files respectively. The following example demonstrates how to submit a local audio file for transcription. To use this example, replace the `` placeholder with the path to the file you wish to transcribe and the `` placeholder with your Rev AI account's access token. ```python from rev_ai import apiclient # configure access token and audio source token = "" filepath = "" # initialize Rev AI API client client = apiclient.RevAiAPIClient(token) # submit a file for transcription job = client.submit_job_local_file(filepath) # get job id job_id = job.id print("Job submitted with id: " + job_id) ``` To run this example, save it as a file, such as `example.py` and then execute `python example.py`. In this example, the API client internally makes a POST request to the API, passing it the audio file to be transcribed. The response body is then received and converted into a Python object. Here is an example of the API response, represented as a Python object: ```python {'callback_url': (None,), 'completed_on': None, 'created_on': '2022-09-14T14:43:35.46Z', 'custom_vocabulary_id': None, 'delete_after_seconds': None, 'duration_seconds': None, 'failure': None, 'failure_detail': None, 'filter_profanity': None, 'id': 'xsDRpD6ladtf', 'language': 'en', 'media_url': None, 'metadata': None, 'name': 'myfile.mp3', 'remove_disfluencies': None, 'rush': None, 'segments_to_transcribe': None, 'skip_diarization': None, 'skip_punctuation': None, 'speaker_channels_count': None, 'status': , 'transcriber': None, 'verbatim': None} ``` The API response contains a job identifier (`id` field). This job identifier will be required to check the job status and obtain the job result. It is also possible to use a remote audio file, as shown in the following example: ```python from rev_ai import apiclient # configure access token and audio source token = "" url = "" # initialize Rev AI API client client = apiclient.RevAiAPIClient(token) # submit a file for transcription job = client.submit_job_url(url) # get job id job_id = job.id print("Job submitted with id: " + job_id) ``` [Learn more about submitting an asynchronous transcription job in the API reference guide](/api/asynchronous). ## Step 3: Check transcription status To check the status of the transcription job, you must submit an HTTP GET request to the API endpoint at `https://api.rev.ai/speechtotext/v1/jobs/`, where `` is a placeholder for the job identifier. Again, the Rev AI Python SDK makes this easy with its `get_job_details()` method, which accepts a job identifier as input and returns the current status of the job as a Python object. The following example demonstrates how to check the status of an asynchronous transcription job. To use this example, replace the `` placeholder with the job identifier and the `` placeholder with your Rev AI account's access token. ```python from rev_ai import apiclient # configure access token and job identifier token = "" job_id = "" # initialize Rev AI API client client = apiclient.RevAiAPIClient(token) # check job status status = client.get_job_details(job_id) # print response object print(vars(status)) ``` Here is an example of the response object received after the job has completed: ```python {'callback_url': (None,), 'completed_on': '2022-09-14T14:44:09.774Z', 'created_on': '2022-09-14T14:43:35.46Z', 'custom_vocabulary_id': None, 'delete_after_seconds': None, 'duration_seconds': 107.0, 'failure': None, 'failure_detail': None, 'filter_profanity': None, 'id': 'xsDRpD6ladtf', 'language': 'en', 'media_url': None, 'metadata': None, 'name': 'myfile.mp3', 'remove_disfluencies': None, 'rush': None, 'segments_to_transcribe': None, 'skip_diarization': None, 'skip_punctuation': None, 'speaker_channels_count': None, 'status': , 'transcriber': None, 'verbatim': None} ``` [Learn more about retrieving the status of an asynchronous transcription job in the API reference guide](/api/asynchronous). ## Step 4: Retrieve the transcript Once the job's `status` changes to `TRANSCRIBED`, you can retrieve the results by submitting an HTTP GET request to the API endpoint at `https://api.rev.ai/speechtotext/v1/jobs//result`, where `` is a placeholder for the job identifier. The Rev AI Python SDK offers three methods for this: `get_transcript_text()`, `get_transcript_json()` and `get_transcript_object()`, which return the transcript as plaintext, JSON and a Python object respectively. The following example demonstrates how to retrieve the results of an asynchronous transcription job. To use this example, replace the `` placeholder with the job identifier and the `` placeholder with your Rev AI account's access token. ```python from rev_ai import apiclient # configure access token and job identifier token = "" job_id = "" # initialize Rev AI API client client = apiclient.RevAiAPIClient(token) # get transcript transcript = client.get_transcript_json(job_id) # print transcript print(transcript) ``` Here is an example of the transcript returned from a successful job, represented as JSON: ```javascript { "monologues": [ { "speaker": 0, "elements": [ { "type": "text", "value": "Hi", "ts": 0.17, "end_ts": 0.52, "confidence": 1 }, { "type": "punct", "value": "," }, { "type": "punct", "value": " " }, { "type": "text", "value": "my", "ts": 0.52, "end_ts": 0.76, "confidence": 1 }, ... ] }, ... ] } ``` [Learn more about obtaining a transcript in the API reference guide](/api/asynchronous). ## Step 5: Create and test a simple application Using the code samples shown previously, it's possible to create a simple application that accepts an audio file URL and returns a transcript, as shown below: ```python from rev_ai import apiclient from time import sleep def main(token, url): # initialize Rev AI API client client = apiclient.RevAiAPIClient(token) # submit a file for transcription job = client.submit_job_url(url) # get job id job_id = job.id print("Job submitted with id: " + job_id) # check job status while (job.status.name == 'IN_PROGRESS'): details = client.get_job_details(job_id) print("Job status: " + details.status.name) # if successful, print result if (details.status.name == 'TRANSCRIBED'): print(client.get_transcript_json(job_id)) break # if unsuccessful, print error if (details.status.name == 'FAILED'): print("Job failed: " + details.failure_detail) break sleep(30) token = "" url = "" main(token, url) ``` This example application begins by initializing an instance of the `RevAiAPIClient` object, passing the Rev AI access token to the object constructor. It then submits a remote file for transcription using the object's `submit_job_url()` method. It then uses the `get_job_details()` method to repeatedly poll the API every 30 seconds to obtain the status of the job. Once the job status is no longer `IN_PROGRESS`, it uses the `get_transcript_json()` method to retrieve the transcript and prints it to the console. Here is an example of the output generated by the example application: ```bash Job submitted with id: XyHxoqX5cH5A Job status: IN_PROGRESS Job status: IN_PROGRESS Job status: TRANSCRIBED {'monologues': [{'speaker': 0, 'elements': [{'type': 'text', 'value': 'Hi', 'ts': 0.17, 'end_ts': 0.52, 'confidence': 1.0}, {'type': 'punct', 'value': ','}, ...]}, ..., ]} ``` The example above polls the API repeatedly to check the status of the transcription job. This is presented only for illustrative purposes and is **strongly recommended against** in production scenarios. For production scenarios, use [webhooks](/api/asynchronous/webhooks) to asynchronously receive notifications once the job completes. ## Next steps Learn more about the topics discussed in this tutorial by visiting the following links: - Documentation: [Asynchronous Speech-To-Text API job submission](/api/asynchronous) - Documentation: [Python SDK](/sdk/python) - Documentation: [Asynchronous Speech-To-Text API best practices](/api/asynchronous/best-practices) - Code samples: [Asynchronous Speech-To-Text API](/api/asynchronous/code-samples) and [Python SDK](/sdk/python/code-samples) - Tutorial: [Get Started with Rev AI API Webhooks](/resources/tutorials/get-started-api-webhooks)