Get Started with Speech Recognition in Python

By Vikram Vaswani, Developer Advocate - September 27, 2022

Introduction

Rev AI's speech-to-text APIs power automatic speech recognition in thousands of applications and services. To make it easier for developers to integrate these APIs into their applications, Rev AI also offers SDKs for many programing languages...including the topic of this tutorial, Python.

In this tutorial, I'll introduce you to the basics of using Rev AI's Asynchronous Speech-to-Text API using Python and the Rev AI Python SDK. If you've ever wondered how to integrate speech recognition capabilities with your Python application, this tutorial will give you all the information you need to get started.

Assumptions

This tutorial assumes that:

You have a Rev AI account and access token. If not, sign up for a free account and generate an access token .
You have a properly-configured Python development environment with Python 3.x. If not, download and install Python for your operating system.
You have installed pip, the Python dependency manager. If not, download and install pip for your operating system.
You have an audio file to transcribe. If not, use this example audio file from Rev AI .

Step 1: Install the SDK

This tutorial will use the Rev AI Python SDK to submit transcription requests to the Rev AI Asynchronous Speech-to-Text API.

Begin by installing the SDK with pip:

Copy

Copied

pip install --upgrade rev_ai

Within your application code, initialize the Rev AI API client as below. Replace the <REVAI_ACCESS_TOKEN> placeholder with your Rev AI access token:

Copy

Copied

from rev_ai import apiclient

# configure access token
token = "<REVAI_ACCESS_TOKEN>"

# initialize Rev AI API client
client = apiclient.RevAiAPIClient(token)

Here, the Rev AI API client is automatically initialized with the base endpoint for the Asynchronous Speech-to-Text API, which is https://api.rev.ai/speechtotext/v1/.

Every request to the API must be in JSON format and must include an Authorization header containing the API access token. The Rev AI Python SDK automatically takes care of attaching this required header to all its client requests.

Step 2: Submit a file for transcription

To generate a transcript from an audio file, you must submit an HTTP POST request to the API endpoint at https://api.rev.ai/speechtotext/v1/jobs. The Rev AI Python SDK simplifies this process with two methods: submit_job_local_file() and submit_job_url(), for local and remote files respectively.

The following example demonstrates how to submit a local audio file for transcription.

To use this example, replace the <FILEPATH> placeholder with the path to the file you wish to transcribe and the <REVAI_ACCESS_TOKEN> placeholder with your Rev AI account's access token.

Copy

Copied

from rev_ai import apiclient

# configure access token and audio source
token = "<REVAI_ACCESS_TOKEN>"
filepath = "<FILEPATH>"

# initialize Rev AI API client
client = apiclient.RevAiAPIClient(token)

# submit a file for transcription
job = client.submit_job_local_file(filepath)

# get job id
job_id = job.id
print("Job submitted with id: " + job_id)

To run this example, save it as a file, such as example.py and then execute python example.py.

In this example, the API client internally makes a POST request to the API, passing it the audio file to be transcribed. The response body is then received and converted into a Python object.

Here is an example of the API response, represented as a Python object:

Copy

Copied

{'callback_url': (None,),
 'completed_on': None,
 'created_on': '2022-09-14T14:43:35.46Z',
 'custom_vocabulary_id': None,
 'delete_after_seconds': None,
 'duration_seconds': None,
 'failure': None,
 'failure_detail': None,
 'filter_profanity': None,
 'id': 'xsDRpD6ladtf',
 'language': 'en',
 'media_url': None,
 'metadata': None,
 'name': 'myfile.mp3',
 'remove_disfluencies': None,
 'rush': None,
 'segments_to_transcribe': None,
 'skip_diarization': None,
 'skip_punctuation': None,
 'speaker_channels_count': None,
 'status': <JobStatus.IN_PROGRESS: 1>,
 'transcriber': None,
 'verbatim': None}

The API response contains a job identifier (id field). This job identifier will be required to check the job status and obtain the job result.

It is also possible to use a remote audio file, as shown in the following example:

Copy

Copied

from rev_ai import apiclient

# configure access token and audio source
token = "<REVAI_ACCESS_TOKEN>"
url = "<URL>"

# initialize Rev AI API client
client = apiclient.RevAiAPIClient(token)

# submit a file for transcription
job = client.submit_job_url(url)

# get job id
job_id = job.id
print("Job submitted with id: " + job_id)

attention

Learn more about submitting an asynchronous transcription job in the API reference guide.

Step 3: Check transcription status

To check the status of the transcription job, you must submit an HTTP GET request to the API endpoint at https://api.rev.ai/speechtotext/v1/jobs/<ID>, where <ID> is a placeholder for the job identifier. Again, the Rev AI Python SDK makes this easy with its get_job_details() method, which accepts a job identifier as input and returns the current status of the job as a Python object.

The following example demonstrates how to check the status of an asynchronous transcription job.

To use this example, replace the <ID> placeholder with the job identifier and the <REVAI_ACCESS_TOKEN> placeholder with your Rev AI account's access token.

Copy

Copied

from rev_ai import apiclient

# configure access token and job identifier
token = "<REVAI_ACCESS_TOKEN>"
job_id = "<ID>"

# initialize Rev AI API client
client = apiclient.RevAiAPIClient(token)

# check job status
status = client.get_job_details(job_id)

# print response object
print(vars(status))

Here is an example of the response object received after the job has completed:

Copy

Copied

{'callback_url': (None,),
 'completed_on': '2022-09-14T14:44:09.774Z',
 'created_on': '2022-09-14T14:43:35.46Z',
 'custom_vocabulary_id': None,
 'delete_after_seconds': None,
 'duration_seconds': 107.0,
 'failure': None,
 'failure_detail': None,
 'filter_profanity': None,
 'id': 'xsDRpD6ladtf',
 'language': 'en',
 'media_url': None,
 'metadata': None,
 'name': 'myfile.mp3',
 'remove_disfluencies': None,
 'rush': None,
 'segments_to_transcribe': None,
 'skip_diarization': None,
 'skip_punctuation': None,
 'speaker_channels_count': None,
 'status': <JobStatus.TRANSCRIBED: 2>,
 'transcriber': None,
 'verbatim': None}

attention

Learn more about retrieving the status of an asynchronous transcription job in the API reference guide.

Step 4: Retrieve the transcript

Once the job's status changes to TRANSCRIBED, you can retrieve the results by submitting an HTTP GET request to the API endpoint at https://api.rev.ai/speechtotext/v1/jobs/<ID>/result, where <ID> is a placeholder for the job identifier. The Rev AI Python SDK offers three methods for this: get_transcript_text(), get_transcript_json() and get_transcript_object(), which return the transcript as plaintext, JSON and a Python object respectively.

The following example demonstrates how to retrieve the results of an asynchronous transcription job.

To use this example, replace the <ID> placeholder with the job identifier and the <REVAI_ACCESS_TOKEN> placeholder with your Rev AI account's access token.

Copy

Copied

from rev_ai import apiclient

# configure access token and job identifier
token = "<REVAI_ACCESS_TOKEN>"
job_id = "<ID>"

# initialize Rev AI API client
client = apiclient.RevAiAPIClient(token)

# get transcript
transcript = client.get_transcript_json(job_id)

# print transcript
print(transcript)

Here is an example of the transcript returned from a successful job, represented as JSON:

Copy

Copied

{
  "monologues": [
    {
      "speaker": 0,
      "elements": [
        {
          "type": "text",
          "value": "Hi",
          "ts": 0.17,
          "end_ts": 0.52,
          "confidence": 1
        },
        {
          "type": "punct",
          "value": ","
        },
        {
          "type": "punct",
          "value": " "
        },
        {
          "type": "text",
          "value": "my",
          "ts": 0.52,
          "end_ts": 0.76,
          "confidence": 1
        },
        ...
      ]
    },
    ...
  ]
}

attention

Learn more about obtaining a transcript in the API reference guide.

Step 5: Create and test a simple application

Using the code samples shown previously, it's possible to create a simple application that accepts an audio file URL and returns a transcript, as shown below:

Copy

Copied

from rev_ai import apiclient
from time import sleep

def main(token, url):
  # initialize Rev AI API client
  client = apiclient.RevAiAPIClient(token)

  # submit a file for transcription
  job = client.submit_job_url(url)

  # get job id
  job_id = job.id
  print("Job submitted with id: " + job_id)

  # check job status
  while (job.status.name == 'IN_PROGRESS'):
    details = client.get_job_details(job_id)
    print("Job status: " + details.status.name)
    # if successful, print result
    if (details.status.name == 'TRANSCRIBED'):
      print(client.get_transcript_json(job_id))
      break
    # if unsuccessful, print error
    if (details.status.name == 'FAILED'):
      print("Job failed: " + details.failure_detail)
      break
    sleep(30)

token = "<REVAI_ACCESS_TOKEN>"
url = "<URL>"
main(token, url)

This example application begins by initializing an instance of the RevAiAPIClient object, passing the Rev AI access token to the object constructor. It then submits a remote file for transcription using the object's submit_job_url() method. It then uses the get_job_details() method to repeatedly poll the API every 30 seconds to obtain the status of the job. Once the job status is no longer IN_PROGRESS, it uses the get_transcript_json() method to retrieve the transcript and prints it to the console.

Here is an example of the output generated by the example application:

Copy

Copied

Job submitted with id: XyHxoqX5cH5A
Job status: IN_PROGRESS
Job status: IN_PROGRESS
Job status: TRANSCRIBED
{'monologues': [{'speaker': 0, 'elements': [{'type': 'text', 'value': 'Hi', 'ts': 0.17, 'end_ts': 0.52, 'confidence': 1.0}, {'type': 'punct', 'value': ','}, ...]}, ..., ]}

warning

The example above polls the API repeatedly to check the status of the transcription job. This is presented only for illustrative purposes and is strongly recommended against in production scenarios. For production scenarios, use webhooks to asynchronously receive notifications once the job completes.

Next steps

Learn more about the topics discussed in this tutorial by visiting the following links:

Documentation: Asynchronous Speech-To-Text API job submission
Documentation: Python SDK
Documentation: Asynchronous Speech-To-Text API best practices
Code samples: Asynchronous Speech-To-Text API and Python SDK
Tutorial: Get Started with Rev AI API Webhooks