Transcribe Audio with Automatic Language Identification
By Vikram Vaswani, Developer Advocate - May 16, 2022
Introduction
Rev AI's Asynchronous Speech-to-Text API is able to transcribe spoken audio even if it's not in English – simply specify the language of the audio file in your transcription job request. However, this assumes that your application is able to identify the language before requesting transcription...and this may not always be the case.
That's where Rev AI's Language Identification API comes in. This API is able to automatically identify the most probable language used in an audio file. It accepts and analyzes an input audio file and returns a list of possible languages, ranked by confidence.
A unique feature of the Language Identification API is that it performs language identification without requiring a list of possible language codes upfront. This feature eliminates the need to first acquire and validate information on language possibilities, reducing work (and code dependencies) for developers.
This tutorial explains how to integrate the Language Identification API with the Asynchronous Speech-to-Text API. It uses a webhook to create a seamless, asynchronous language identification and transcription process for use in ASR applications.
Assumptions
This tutorial assumes that:
- You have a Rev AI account and access token. If not, sign up for a free account and generate an access token .
- You have a properly-configured Node.js development environment with Node.js v16.x or v17.x. If not, download and install Node.js for your operating system.
- You have some familiarity with webhooks. If not, learn the basics of using Rev AI API webhooks and then read about using webhooks to send email notifications on job completion .
- You have some familiarity with the Express framework . If not, familiarize yourself with the basics using this example application .
-
Your webhook will be available at a public URL. If not, or if you prefer to develop and test locally,
download and install
ngrok
to generate a temporary public URL for your webhook. - You have an audio file to transcribe. If not, use this example audio file from Rev AI .
Technical approach
There are two stages in performing transcription with automatic language identification.
Stage 1: Language identification
To perform language identification on an audio file, you must submit an HTTP POST request with various parameters (including either the audio file or its URL) to the API endpoint at https://api.rev.ai/language_identification/v1/jobs
. Here is an example request:
curl -X POST "https://api.rev.ai/languageid/v1/jobs" \
-H "Authorization: Bearer <REVAI_ACCESS_TOKEN>" \
-H "Content-Type: application/json" \
-d '{"source_config": {"url": "https://www.rev.ai/FTC_Sample_1.mp3"},"notification_config": {"url": "https://example.com/callback"}}'
When a webhook URL is included with the job parameters, as in the example above, then, on job completion, the Language Identification API will send an HTTP POST request containing the job status to the specified webhook URL.
The webhook URL handler will receive and parse this status and, if successful, it will make a GET request to the API endpoint at https://api.rev.ai/language_identification/v1/jobs/<ID>/result
to obtain the list of identified languages. The most probable language for the submitted audio file is specified in the top_language
property of the final response. Here is an example response:
{
"top_language": "en",
"language_confidences": [
{
"language": "en",
"confidence": 0.907
},
{
"language": "nl",
"confidence": 0.023
}
]
}
Stage 2: Transcription
With the language identification complete, the webhook URL handler will then trigger a new transcription request to the Asynchronous Speech-to-Text API endpoint at https://api.rev.ai/speechtotext/v1/jobs
, passing along the language identification data with the request. Here is an example request:
curl -X POST "https://api.rev.ai/speechtotext/v1/jobs" \
-H "Authorization: Bearer <REVAI_ACCESS_TOKEN>" \
-H "Content-Type: application/json" \
-d '{"source_config": {"url": "https://www.rev.ai/FTC_Sample_1.mp3"},"language":"en","notification_config": {"url": "https://example.com/callback"}}'
Here too, since a webhook URL is included with the job parameters, the Asynchronous Speech-to-Text API will send an HTTP POST request containing the job status to the specified webhook URL once the job is complete.
The webhook URL handler will check this status and, if it is successful, it will make a GET request to the API endpoint at https://api.rev.ai/speechtotext/v1/jobs/<ID>/transcript
to obtain the final transcript. Here is an example transcript response from the API:
{
"monologues": [
{
"speaker": 1,
"elements": [
{
"type": "text",
"value": "Hi",
"ts": 0.27,
"end_ts": 0.32,
"confidence": 1
},
{
"type": "punct",
"value": ","
},
{
"type": "punct",
"value": " "
},
{
"type": "text",
"value": "my",
"ts": 0.35,
"end_ts": 0.46,
"confidence": 1
},
{
"type": "punct",
"value": " "
},
{
"type": "text",
"value": "name's",
"ts": 0.47,
"end_ts": 0.59,
"confidence": 1
},
{
...
}
]
},
{
...
}
]
}
attention
Learn more about submitting an asynchronous transcription job and obtaining a transcript.
Sequence diagram
The following diagram explains the communication between the client and the two APIs visually:
attention
As an alternative to custom-crafting HTTP GET and POST requests to various API endpoints and evaluating the resulting responses, this tutorial uses the Rev AI Node SDK, which provides ready-made, tested and documented methods to communicate with the different Rev AI APIs.
Step 1: Install required packages
This tutorial will use:
- The Rev AI Node SDK , to submit language identification and transcription requests to the Rev AI APIs;
- The Express Web framework and body-parser middleware , to receive and parse webhook requests.
Begin by installing the required packages:
npm i revai-node-sdk express body-parser
Step 2: Create a webhook handler
The next step is to define a webhook handler within the application that receives job notifications from the APIs.
The following example demonstrates a webhook handler that receives both language identification and transcription job results from the respective APIs. If the results are successful, it performs the following additional processing:
- For language identification jobs, it obtains the list of identified languages and the most probable language, and then initiates an asynchronous transcription request that includes this language information.
- For asynchronous transcription jobs, it obtains the final transcript and prints it to the console.
To use this example, replace the <REVAI_ACCESS_TOKEN>
placeholder with your Rev AI account's access token.
const { RevAiApiClient } = require('revai-node-sdk');
const bodyParser = require('body-parser');
const express = require('express');
const axios = require('axios');
const token = '<REVAI_ACCESS_TOKEN>';
// create Axios client
const http = axios.create({
baseURL: 'https://api.rev.ai/',
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json'
}
});
// create Rev AI API client
const revAiClient = new RevAiApiClient(token);
const getLanguageIdentificationJobResult = async (jobId) => {
return await http.get(`languageid/v1beta/jobs/${jobId}/result`,
{ headers: { 'Accept': 'application/vnd.rev.languageid.v1.0+json' } })
.then(response => response.data)
.catch(console.error);
};
// create Express application
const app = express();
app.use(bodyParser.json());
// define webhook handler
app.post('/hook', async req => {
// get job, media URL, callback URL
const job = req.body.job;
const fileUrl = job.media_url;
const callbackUrl = job.callback_url;
console.log(`Received status for job id ${job.id}: ${job.status}`);
try {
switch (job.type) {
// language job result handler
case 'language_id':
if (job.status === 'completed') {
const languageJobResult = await getLanguageIdentificationJobResult(job.id);
// retrieve most probable language
// use as input to transcription request
const languageId = languageJobResult.top_language;
console.log(`Received result for job id ${job.id}: language '${languageId}'`);
const transcriptJobSubmission = await revAiClient.submitJobUrl(fileUrl, {
language: languageId,
callback_url: callbackUrl
});
console.log(`Submitted for transcription with job id ${transcriptJobSubmission.id}`);
}
break;
// transcription job result handler
case 'async':
if (job.status === 'transcribed') {
// retrieve transcript
const transcriptJobResult = await revAiClient.getTranscriptObject(job.id);
console.log(`Received transcript for job id ${job.id}`);
// do something with transcript
// for example: print to console
console.log(transcriptJobResult);
}
break;
}
} catch (e) {
console.error(e);
}
});
// start application on port 3000
app.listen(3000, () => {
console.log('Webhook listening');
})
Save this code listing as index.js
and take a closer look at it:
-
This code listing begins by importing the required packages and credentials and creating a Rev AI API client
RevAiApiClient
for the Asynchronous Speech-to-Text API. It also creates an Axios HTTP clienthttp
for the Language Identification API. -
It starts an Express application on port 3000 and waits for incoming POST requests to the
/hook
URL route. -
When the application receives a POST request at
/hook
, it parses the incoming JSON message body, extracts the file and callback URLs and checks the jobtype
. -
For language identification jobs (
type: language_id
):-
It checks the job
status
and ifcompleted
, it requests the list of identified languages via thegetLanguageIdentificationJobResult()
function. The returned object contains atop_language
property with the language code for the most probable language. -
It submits the audio file for transcription using the Rev AI API client's
submitJobUrl()
method. The second argument to this method is an object containing job parameters. Here, the parameters are the webhook URL (callback_url
), which is set to the current webhook URL, and the language (language
), which is set to thetop_language
value.
-
It checks the job
-
For asynchronous transcription jobs (
type: async
):-
It checks the job
status
and iftranscribed
, it uses the client'sgetTranscriptObject()
method to retrieve the complete transcript as a JSON document. This transcript can then be processed further depending on the requirements of the application. In this illustrative example, it is simply sent to the console but for more complex scenarios, it could be saved to a database, presented to the user for review, or acted upon in a different way.
-
It checks the job
- Errors, if any, in the above process are sent to the console.
Step 3: Test the webhook
To see the webhook in action, first ensure that you have replaced the placeholders as described in the previous step and then start the application using the command below.
node index.js
Next, submit an audio file for language identification to Rev AI and include the callback_url
parameter in your request. This parameter specifies the webhook URL that the Rev AI API should invoke on job completion.
Here is an example of submitting an audio file with a webhook using curl
.
curl -X POST "https://api.rev.ai/languageid/v1/jobs" \
-H "Authorization: Bearer <REVAI_ACCESS_TOKEN>" \
-H "Content-Type: application/json" \
-d '{"media_url":"<URL>","callback_url":"http://<WEBHOOK-HOST>/hook"}'
Replace the <REVAI_ACCESS_TOKEN>
placeholder with your Rev AI access token and the <URL>
placeholder with the direct URL to your audio file. Additionally, replace the <WEBHOOK-HOST>
placeholder as follows:
-
If you are developing and testing in the public cloud, your Express application will typically be available at a public domain or IP address. In this case, replace the
<WEBHOOK-HOST>
placeholder with the correct domain name or IP address, including the port number3000
if required. -
If you are developing and testing locally, your Express application will not be available publicly and you must therefore configure a public forwarding URL using a tool like
ngrok
. Obtain this URL using the commandngrok http 3000
and replace the<WEBHOOK-HOST>
placeholder with the temporary forwarding URL generated byngrok
.
Once the job is processed, the Rev AI Language Identification API will send a POST request to the webhook URL. This will trigger the process described above and shortly after, the transcript will be printed to the console. The transcript can also be viewed through the Rev AI dashboard.
attention
If the webhook doesn't work as expected, you can test and inspect the webhook data.
Next steps
Learn more about Rev AI language identification, asynchronous transcription and webhook usage by visiting the following links:
- Documentation: Language Identification API overview , job submission and webhooks
- Documentation: Asynchronous Speech-To-Text API overview , job submission and webhooks
- Tutorial: Get Started with Rev AI Webhooks
- Tutorial: Use Webhooks to Trigger Job Email Notifications with Node.js, SendGrid and Express
-
Documentation:
Using
ngrok