All transcript responses from the Streaming Speech-to-Text API are text messages and are returned as serialized JSON. The transcript response has two states: partial hypothesis and final hypothesis.

The JSON will contain a type property which indicates what kind of response the message is. Valid values for this type property are:

  • "connected"
  • "partial"
  • "final"

The "connected" type is only returned once during the initial handshake when opening a WebSocket connection. All other responses should be of the type "partial" or "final".

Response Object

Here is a brief description of the response object and its properties:

Property Name Type Description
type string Either "partial" or "final"
ts double The start time of the hypothesis in seconds
end_ts double The end time of the hypothesis in seconds
elements array of Elements Only present if final property is true. A list of Rev AI transcript element properties. See Transcript object for details that are all the recognized words up to current point in audio

Partial Hypotheses

While clients are streaming audio data, Rev AI processes and returns partial hypotheses. Partial hypotheses are the AI's best guess of what was said up to that moment in time.

Multiple partial hypotheses can be returned for the same audio segment. Partial hypotheses can return different individual words at different moments in time (see example)

Final Hypothesis

Once the AI is confident in the transcript, a final hypothesis will be delivered. When Rev AI returns a final hypothesis, the output for that section of audio will no longer change.

These final hypotheses contains all the information of a partial hypothesis, but the elements will contain finer-grained details such as timestamp and confidence scores. The timestamp will be measured in absolute time (relative to the start of the audio input).

The final transcript for a completed streaming session can also be obtained via the Get Transcript endpoint of the Asynchronous Speech-to-Text API when using the JSON response schema. The availability of this transcript is subject to the normal deletion control rules


See examples of the sequence of messages between a client and the Streaming Speech-to-Text API.