# Transcribe Audio with Automatic Language Identification

**By Vikram Vaswani, Developer Advocate - May 16, 2022**

## Introduction

Rev AI's [Asynchronous Speech-to-Text API](/api/asynchronous) is able to transcribe spoken audio even if it's not in English – simply specify the language of the audio file in your transcription job request. However, this assumes that your application is able to identify the language before requesting transcription...and this may not always be the case.

That's where Rev AI's [Language Identification API](/api/language-identification) comes in. This API is able to automatically identify the most probable language used in an audio file. It accepts and analyzes an input audio file and returns a list of possible languages, ranked by confidence.

A unique feature of the Language Identification API is that it performs language identification without requiring a list of possible language codes upfront. This feature eliminates the need to first acquire and validate information on language possibilities, reducing work (and code dependencies) for developers.

This tutorial explains how to integrate the Language Identification API with the Asynchronous Speech-to-Text API. It uses a webhook to create a seamless, asynchronous language identification and transcription process for use in ASR applications.

## Assumptions

This tutorial assumes that:

- You have a Rev AI account and access token. If not, [sign up for a free account](https://www.rev.ai/auth/signup) and [generate an access token](/get-started#step-1-get-your-access-token).
- You have a properly-configured Node.js development environment with Node.js v16.x or v17.x. If not, [download and install Node.js](https://nodejs.org/en/download/) for your operating system.
- You have some familiarity with webhooks. If not, [learn the basics of using Rev AI API webhooks](/resources/tutorials/get-started-api-webhooks) and then [read about using webhooks to send email notifications on job completion](/resources/tutorials/send-email-notifications-webhooks).
- You have some familiarity with the [Express framework](https://expressjs.com/). If not, [familiarize yourself with the basics using this example application](https://expressjs.com/en/starter/hello-world.html).
- Your webhook will be available at a public URL. If not, or if you prefer to develop and test locally, [download and install `ngrok`](https://ngrok.com/) to generate a temporary public URL for your webhook.
- You have an audio file to transcribe. If not, use this [example audio file from Rev AI](https://www.rev.ai/FTC_Sample_1.mp3).


## Technical approach

There are two stages in performing transcription with automatic language identification.

### Stage 1: Language identification

To perform language identification on an audio file, you must submit an HTTP POST request with various parameters (including either the audio file or its URL) to the API endpoint at `https://api.rev.ai/language_identification/v1/jobs`. Here is an example request:


```bash
curl -X POST "https://api.rev.ai/languageid/v1/jobs" \
     -H "Authorization: Bearer <REVAI_ACCESS_TOKEN>" \
     -H "Content-Type: application/json" \
     -d '{"source_config": {"url": "https://www.rev.ai/FTC_Sample_1.mp3"},"notification_config": {"url": "https://example.com/callback"}}'
```

When a webhook URL is included with the job parameters, as in the example above, then, on job completion, the Language Identification API will send an HTTP POST request containing the job status to the specified webhook URL.

The webhook URL handler will receive and parse this status and, if successful, it will make a GET request to the API endpoint at `https://api.rev.ai/language_identification/v1/jobs/<ID>/result` to obtain the list of identified languages. The most probable language for the submitted audio file is specified in the `top_language` property of the final response. Here is an example response:


```javascript
{
  "top_language": "en",
  "language_confidences": [
    {
      "language": "en",
      "confidence": 0.907
    },
    {
      "language": "nl",
      "confidence": 0.023
    }
  ]
}
```

### Stage 2: Transcription

With the language identification complete, the webhook URL handler will then trigger a new transcription request to the Asynchronous Speech-to-Text API endpoint at `https://api.rev.ai/speechtotext/v1/jobs`, passing along the language identification data with the request. Here is an example request:


```bash
curl -X POST "https://api.rev.ai/speechtotext/v1/jobs" \
     -H "Authorization: Bearer <REVAI_ACCESS_TOKEN>" \
     -H "Content-Type: application/json" \
     -d '{"source_config": {"url": "https://www.rev.ai/FTC_Sample_1.mp3"},"language":"en","notification_config": {"url": "https://example.com/callback"}}'
```

Here too, since a webhook URL is included with the job parameters, the Asynchronous Speech-to-Text API will send an HTTP POST request containing the job status to the specified webhook URL once the job is complete.

The webhook URL handler will check this status and, if it is successful, it will make a GET request to the API endpoint at `https://api.rev.ai/speechtotext/v1/jobs/<ID>/transcript` to obtain the final transcript. Here is an example transcript response from the API:


```javascript
{
  "monologues": [
    {
      "speaker": 1,
      "elements": [
        {
          "type": "text",
          "value": "Hi",
          "ts": 0.27,
          "end_ts": 0.32,
          "confidence": 1
        },
        {
          "type": "punct",
          "value": ","
        },
        {
          "type": "punct",
          "value": " "
        },
        {
          "type": "text",
          "value": "my",
          "ts": 0.35,
          "end_ts": 0.46,
          "confidence": 1
        },
        {
          "type": "punct",
          "value": " "
        },
        {
          "type": "text",
          "value": "name's",
          "ts": 0.47,
          "end_ts": 0.59,
          "confidence": 1
        },
        {
          ...
        }
      ]
    },
    {
      ...
    }
  ]
}
```

Learn more about [submitting an asynchronous transcription job](/api/asynchronous) and [obtaining a transcript](/api/asynchronous).

### Sequence diagram

The following diagram explains the communication between the client and the two APIs visually:


```mermaid
sequenceDiagram
    Client->>Rev AI: POST /language_identification  { source_config.url: /file, notification_config.url: /my/url }
    loop
        Rev AI->Rev AI: Process job
    end
    Rev AI-->>Client: POST /my/url { job: data }
    note left of Client: Process received data
    Client->>Rev AI: GET /language_identification
    Rev AI->>Client: { languages: data, top_language: data }
    Client->>Rev AI: POST /speechtotext  { source_config.url: /file, notification_config.url: /my/url }
    loop
        Rev AI->Rev AI: Process job
    end
    Rev AI-->>Client: POST /my/url { job: data }
    note left of Client: Process received data
    Client->>Rev AI: GET /speechtotext
    Rev AI->>Client: { transcript: data }
```

As an alternative to custom-crafting HTTP GET and POST requests to various API endpoints and evaluating the resulting responses, this tutorial uses the Rev AI Node SDK, which provides ready-made, tested and documented methods to communicate with the different Rev AI APIs.

## Step 1: Install required packages

This tutorial will use:

- The [Rev AI Node SDK](/sdk/node), to submit language identification and transcription requests to the Rev AI APIs;
- The [Express Web framework](https://expressjs.com/) and [body-parser middleware](https://www.npmjs.com/package/body-parser), to receive and parse webhook requests.


Begin by installing the required packages:


```bash
npm i revai-node-sdk express body-parser
```

## Step 2: Create a webhook handler

The next step is to define a webhook handler within the application that receives job notifications from the APIs.

The following example demonstrates a webhook handler that receives both language identification and transcription job results from the respective APIs. If the results are successful, it performs the following additional processing:

- For language identification jobs, it obtains the list of identified languages and the most probable language, and then initiates an asynchronous transcription request that includes this language information.
- For asynchronous transcription jobs, it obtains the final transcript and prints it to the console.


To use this example, replace the `<REVAI_ACCESS_TOKEN>` placeholder with your Rev AI account's access token.


```javascript
const { RevAiApiClient } = require('revai-node-sdk');
const bodyParser = require('body-parser');
const express = require('express');
const axios = require('axios');

const token = '<REVAI_ACCESS_TOKEN>';

// create Axios client
const http = axios.create({
  baseURL: 'https://api.rev.ai/',
  headers: {
    'Authorization': `Bearer ${token}`,
    'Content-Type': 'application/json'
  }
});

// create Rev AI API client
const revAiClient = new RevAiApiClient(token);

const getLanguageIdentificationJobResult = async (jobId) => {
  return await http.get(`languageid/v1beta/jobs/${jobId}/result`,
    { headers: { 'Accept': 'application/vnd.rev.languageid.v1.0+json' } })
    .then(response => response.data)
    .catch(console.error);
};

// create Express application
const app = express();
app.use(bodyParser.json());

// define webhook handler
app.post('/hook', async req => {
  // get job, media URL, callback URL
  const job = req.body.job;
  const fileUrl = job.media_url;
  const callbackUrl = job.callback_url;
  console.log(`Received status for job id ${job.id}: ${job.status}`);

  try {
    switch (job.type) {
      // language job result handler
      case 'language_id':
        if (job.status === 'completed') {
          const languageJobResult = await getLanguageIdentificationJobResult(job.id);
          // retrieve most probable language
          // use as input to transcription request
          const languageId = languageJobResult.top_language;
          console.log(`Received result for job id ${job.id}: language '${languageId}'`);
          const transcriptJobSubmission = await revAiClient.submitJobUrl(fileUrl, {
            language: languageId,
            callback_url: callbackUrl
          });
          console.log(`Submitted for transcription with job id ${transcriptJobSubmission.id}`);
        }
        break;
      // transcription job result handler
      case 'async':
        if (job.status === 'transcribed') {
          // retrieve transcript
          const transcriptJobResult = await revAiClient.getTranscriptObject(job.id);
          console.log(`Received transcript for job id ${job.id}`);
          // do something with transcript
          // for example: print to console
          console.log(transcriptJobResult);
        }
        break;
    }
  } catch (e) {
    console.error(e);
  }
});


//  start application on port 3000
app.listen(3000, () => {
  console.log('Webhook listening');
})
```

Save this code listing as `index.js` and take a closer look at it:

- This code listing begins by importing the required packages and credentials and creating a Rev AI API client `RevAiApiClient` for the Asynchronous Speech-to-Text API. It also creates an Axios HTTP client `http` for the Language Identification API.
- It starts an Express application on port 3000 and waits for incoming POST requests to the `/hook` URL route.
- When the application receives a POST request at `/hook`, it parses the incoming JSON message body, extracts the file and callback URLs and checks the job `type`.
- For language identification jobs (`type: language_id`):
  - It checks the job `status` and if `completed`, it requests the list of identified languages via the `getLanguageIdentificationJobResult()` function. The returned object contains a `top_language` property with the language code for the most probable language.
  - It submits the audio file for transcription using the Rev AI API client's `submitJobUrl()` method. The second argument to this method is an object containing job parameters. Here, the parameters are the webhook URL (`callback_url`), which is set to the current webhook URL, and the language (`language`), which is set to the `top_language` value.
- For asynchronous transcription jobs (`type: async`):
  - It checks the job `status` and if `transcribed`, it uses the client's `getTranscriptObject()` method to retrieve the complete transcript as a JSON document. This transcript can then be processed further depending on the requirements of the application. In this illustrative example, it is simply sent to the console but for more complex scenarios, it could be saved to a database, presented to the user for review, or acted upon in a different way.
- Errors, if any, in the above process are sent to the console.


## Step 3: Test the webhook

To see the webhook in action, first ensure that you have replaced the placeholders as described in the previous step and then start the application using the command below.


```bash
node index.js
```

Next, submit an audio file for language identification to Rev AI and include the `callback_url` parameter in your request. This parameter specifies the webhook URL that the Rev AI API should invoke on job completion.

Here is an example of submitting an audio file with a webhook using `curl`.


```bash
curl -X POST "https://api.rev.ai/languageid/v1/jobs" \
     -H "Authorization: Bearer <REVAI_ACCESS_TOKEN>" \
     -H "Content-Type: application/json" \
     -d '{"media_url":"<URL>","callback_url":"http://<WEBHOOK-HOST>/hook"}'
```

Replace the `<REVAI_ACCESS_TOKEN>` placeholder with your Rev AI access token and the `<URL>` placeholder with the direct URL to your audio file. Additionally, replace the `<WEBHOOK-HOST>` placeholder as follows:

- If you are developing and testing in the public cloud, your Express application will typically be available at a public domain or IP address. In this case, replace the `<WEBHOOK-HOST>` placeholder with the correct domain name or IP address, including the port number `3000` if required.
- If you are developing and testing locally, your Express application will not be available publicly and you must therefore configure a public forwarding URL using a tool like `ngrok`. Obtain this URL using the command `ngrok http 3000` and replace the `<WEBHOOK-HOST>` placeholder with the temporary forwarding URL generated by `ngrok`.


Once the job is processed, the Rev AI Language Identification API will send a POST request to the webhook URL. This will trigger the process described above and shortly after, the transcript will be printed to the console. The transcript can also be viewed through the Rev AI dashboard.

If the webhook doesn't work as expected, you can [test and inspect the webhook data](/resources/tutorials/get-started-api-webhooks).

## Next steps

Learn more about Rev AI language identification, asynchronous transcription and webhook usage by visiting the following links:

- Documentation: Language Identification API [overview](/api/language-identification), [job submission](/api/language-identification) and [webhooks](/api/language-identification/webhooks)
- Documentation: Asynchronous Speech-To-Text API [overview](/api/asynchronous), [job submission](/api/asynchronous) and [webhooks](/api/asynchronous/webhooks)
- Tutorial: [Get Started with Rev AI Webhooks](/resources/tutorials/get-started-api-webhooks)
- Tutorial: [Use Webhooks to Trigger Job Email Notifications with Node.js, SendGrid and Express](/resources/tutorials/send-email-notifications-webhooks)
- Documentation: [Using `ngrok`](https://ngrok.com/docs#getting-started-expose)