# Overview

There are two ways to interact with the Streaming Speech-to-Text API:

- WebSocket protocol
- RTMP streams


For a non-streaming solution, refer to the [Asynchronous Speech-to-Text API](/api/asynchronous) documentation.

## WebSocket protocol

### API endpoint

All connections to Rev AI's Streaming Speech-to-Text API start as a WebSocket handshake HTTP request to `wss://api.rev.ai/speechtotext/v1/stream`. On successful authorization, the client can start sending binary WebSocket messages containing audio data in one of the supported formats. As speech is detected, Rev AI returns hypotheses of the recognized speech content.

The base URL is different from the base URL for the Asynchronous Speech-to-Text API.

#### Example


```bash
wss://api.rev.ai/speechtotext/v1/stream?access_token=<REVAI_ACCESS_TOKEN>&content_type=audio/x-raw;layout=interleaved;rate=16000;format=S16LE;channels=1&metadata=<METADATA>
```

### Requests

A WebSocket request consists of the following parts:

|  | Request parameter | Required | Default |
|  --- | --- | --- | --- |
| Base URL |  | Yes |  |
| [Access token](/api/streaming/requests#access-token) | `access_token` | Yes | None |
| [Content type](/api/streaming/requests#content-type) | `content_type` | Yes | None |
| [Language](/api/streaming/requests#language) | `language` | No | `en` |
| [Metadata](/api/streaming/requests#metadata) | `metadata` | No | None |
| [Custom vocabulary](/api/streaming/requests#custom-vocabulary) | `custom_vocabulary_id` | No | None |
| [Profanity filter](/api/streaming/requests#profanity-filter) | `filter_profanity` | No | `false` |
| [Disfluencies](/api/streaming/requests#disfluencies) | `remove_disfluencies` | No | `false` |
| [Delete after seconds](/api/streaming/requests#delete-after-seconds) | `delete_after_seconds` | No | None |
| [Detailed partials](/api/streaming/requests#detailed-partials) | `detailed_partials` | No | `false` |
| [Start timestamp](/api/streaming/requests#start-timestamp) | `start_ts` | No | None |
| [Maximum Segment Duration Seconds](/api/streaming/requests#maximum-segment-duration-seconds) | `max_segment_duration_seconds` | No | None |
| [Transcriber](/api/streaming/requests#transcriber) | `transcriber` | No | See transcriber section |
| [Speaker Switch](/api/streaming/requests#speaker-switch-detection) | `enable_speaker_switch` | No | `false` |
| [Skip Post-processing](/api/streaming/requests#skip-post-processing) | `skip_postprocessing` | No | `false` |
| [Priority](/api/streaming/requests#priority) | `priority` | No | `speed` |
| [Maximum wait time for connection](/api/streaming/requests#maximum-wait-time-for-connection) | `max_connection_wait_seconds` | No | `60` |


[Learn more about request parameters](/api/streaming/requests).

### Responses

All transcript responses from the Streaming Speech-to-Text API are text messages and are returned as serialized JSON. The transcript response has two states: `partial` hypothesis and `final` hypothesis. The JSON will contain a `type` property which indicates what kind of response the message is.

[Learn more about responses](/api/streaming/responses).

### API limits

The following limits are in place for the Streaming Speech-to-Text API:

- Streaming concurrency limit is 10.
- Time limit per stream is 3 hours.


When your stream approaches the 3-hour limit, you should initialize a new concurrent WebSocket connection. Once your WebSocket connection is accepted and the `"connected"` type message is received, you can switch to the new WebSocket and begin streaming audio to it.

The concurrency limit is configurable by Rev AI support. To adjust this limit, contact the support team at [support@rev.ai](mailto:support@rev.ai).

## RTMP streams

RTMP streams is not supported by HIPAA

### API endpoint

The base URL is different from the base URL for the Asynchronous Speech-to-Text API.

All Real-Time Messaging Protocol (RTMP) streaming connections to Rev AI's Streaming Speech-to-Text API start as a POST HTTP request to `https://api.rev.ai/speechtotext/v1/live_stream/rtmp` with the user's access token as a Bearer authentication token. Users should include their intended job options in this HTTP POST request.

On successful authorization, the API returns a JSON object containing `read_url` and `ingestion_url` URL endpoints and a `stream_name` value for the session. The `ingestion_url` URL will have the correct query parameters and values for the job as specified by the user.

The client can now make a WebSocket connection to the `read_url` to receive streaming results and then begin streaming audio to the RTMP `ingestion_url` provided in the response using the provided `stream_name` as the stream name for that session. As speech is detected, Rev AI returns hypotheses of the recognized speech content.

#### Example


```bash
wss://api.rev.ai/speechtotext/v1/read_stream?read_token=<GENERATED_READ_TOKEN>
```

### Requests

A WebSocket request consists of the following parts:

|  | Request parameter | Required | Default |
|  --- | --- | --- | --- |
| `read_url` URL |  | Yes |  |
| [Access token](/api/streaming/requests#access-token) | `access_token` | Yes | None |
| [Content type](/api/streaming/requests#content-type) | `content_type` | Yes | None |
| [Language](/api/streaming/requests#language) | `language` | No | `en` |
| [Metadata](/api/streaming/requests#metadata) | `metadata` | No | None |
| [Custom vocabulary](/api/streaming/requests#custom-vocabulary) | `custom_vocabulary_id` | No | None |
| [Profanity filter](/api/streaming/requests#profanity-filter) | `filter_profanity` | No | `false` |
| [Disfluencies](/api/streaming/requests#disfluencies) | `remove_disfluencies` | No | `false` |
| [Delete after seconds](/api/streaming/requests#delete-after-seconds) | `delete_after_seconds` | No | None |
| [Detailed partials](/api/streaming/requests#detailed-partials) | `detailed_partials` | No | `false` |
| [Start timestamp](/api/streaming/requests#start-timestamp) | `start_ts` | No | None |
| [Maximum Segment Duration Seconds](/api/streaming/requests#maximum-segment-duration-seconds) | `max_segment_duration_seconds` | No | None |
| [Transcriber](/api/streaming/requests#transcriber) | `transcriber` | No | See transcriber section |
| [Speaker Switch](/api/streaming/requests#speaker-switch-detection) | `enable_speaker_switch` | No | `false` |
| [Skip Post-processing](/api/streaming/requests#skip-post-processing) | `skip_postprocessing` | No | `false` |
| [Priority](/api/streaming/requests#priority) | `priority` | No | `speed` |
| [Maximum wait time for connection](/api/streaming/requests#maximum-wait-time-for-connection) | `max_connection_wait_seconds` | No | `60` |


[Learn more about request parameters](/api/streaming/requests).

### Responses

All transcript responses from the Streaming Speech-to-Text API are text messages and are returned as serialized JSON. The transcript response has two states: `partial` hypothesis and `final` hypothesis. The JSON will contain a `type` property which indicates what kind of response the message is.

[Learn more about responses](/api/streaming/responses).

### API limits

The following limits are in place for the Streaming Speech-to-Text API:

- Streaming concurrency limit is 10.
- Time limit per stream is 3 hours.


When your stream approaches the 3-hour limit, you must request and obtain a new `ingestion_url`, `read_url` and `stream_name`. You should then initialize a new concurrent WebSocket connection to the new `read_url` endpoint. Once your WebSocket connection is accepted and the `"connected"` type message is received, you can switch to the new `ingestion_url` endpoint and begin streaming RTMP audio to it.

The concurrency limit is configurable by Rev AI support. To adjust this limit, contact the support team at [support@rev.ai](mailto:support@rev.ai).

## Formats

Although the Streaming Speech-to-Text API technically supports all the [formats supported by FFmpeg](https://ffmpeg.org/general.html#File-Formats), it is recommended to send audio streams as [raw audio, FLAC or WAV](/api/streaming/requests#content-type) as other formats can result in slightly increased latency and inconsistent results.

## HIPAA compliance

The API supports HIPAA-compliant processing. However, this feature is not activated by default and must be explicitly activated at account level. Learn more about [Rev AI's HIPAA compliance and how to HIPAA-enable a Rev AI user account](/api/hipaa).

The API has the following limitations in HIPAA context:

1. [RTMP streams](/api/streaming/requests#rtmp-streams) are not supported.


## Error codes

WebSocket close messages have a range of default error codes that signal why the socket connection was closed. See [RFC-6455](https://tools.ietf.org/html/rfc6455#section-7.4.1) for a range of the pre-defined error codes.

In addition to these error codes, the following table defines Rev AI custom error codes in the `4xxx` range. Some errors can be resolved simply by retrying the request. The table indicates which errors are likely to be resolved with successive retries.

| Error Code | Description | Retry? |
|  --- | --- | --- |
| 4001 | Unauthorized. Returned when the provided access token is invalid. | No |
| 4002 | Bad request. Returned when the connection’s `content-type` is invalid, `metadata` contains too many characters or the custom vocabulary does not exist with that `id`. | No |
| 4003 | Insufficient credits. Returned when the client does not have enough credits to continue the streaming session. | No |
| 4010 | Server shutting down. The connection was terminated due to the server shutting down. | Yes |
| 4013 | No instance available. No available streaming instances were found. User should attempt to retry the connection later. | Yes |
| 4029 | Too many requests. The number of concurrent connections exceeded the limit. Contact customer support to increase it. | No |


It is recommended that the maximum number of retries be limited to 5 attempts per request.

## Billing

For billing purposes we track two values during each stream: stream duration and audio duration. At the end of each stream you will be charged for the larger of the two, rounded up to the nearest second, with an absolute minimum of 15 seconds.

Audio duration (AD) refers to the number of seconds of audio that have been sent over the WebSocket.
Stream duration (SD) refers to the number of real world seconds which have passed since the WebSocket connection was established.

Here are some examples:

- AD: 4.1 seconds, SD: 4.1 seconds. Charged as 15 seconds.
- AD: 14.1 seconds, SD: 14.1 seconds. Charged as 15 seconds.
- AD: 15 seconds, SD: 15 seconds. Charged as 15 seconds.
- AD: 15 seconds, SD: 16 seconds. Charged as 16 seconds.
- AD: 16.1 seconds, SD: 16.1 seconds. Charged as 17 seconds.
- AD: 24.7 seconds, SD: 14 seconds. Charged as 24 seconds.
- AD: 14 seconds, SD: 24 seconds. Charged as 24 seconds.


[Learn more about billing and credits](/api/streaming/billing)