Responses
All transcript responses from the Streaming Speech-to-Text API are text messages and are returned as serialized JSON. The transcript response has two states: partial
hypothesis and final
hypothesis.
The JSON will contain a type
property which indicates what kind of response the message is. Valid values for this type
property are:
-
"connected"
-
"partial"
-
"final"
The "connected"
type is only returned once during the initial handshake when opening a WebSocket connection. All other responses should be of the type "partial"
or "final"
.
Response Object
Here is a brief description of the response object and its properties:
Property Name | Type | Description |
---|---|---|
type | string | Either "partial" or "final" |
ts | double | The start time of the hypothesis in seconds |
end_ts | double | The end time of the hypothesis in seconds |
elements | array of Elements | Only present if final property is true. A list of Rev AI transcript element properties. See Transcript object for details that are all the recognized words up to current point in audio |
Partial Hypotheses
While clients are streaming audio data, Rev AI processes and returns partial hypotheses. Partial hypotheses are the AI's best guess of what was said up to that moment in time.
Multiple partial hypotheses can be returned for the same audio segment. Partial hypotheses can return different individual words at different moments in time (see example)
Final Hypothesis
Once the AI is confident in the transcript, a final
hypothesis will be delivered. When Rev AI returns a final
hypothesis, the output for that section of audio will no longer change.
These final hypotheses contains all the information of a partial
hypothesis, but the elements
will contain finer-grained details such as timestamp
and confidence
scores. The timestamp
will be measured in absolute time (relative to the start of the audio input).
The final transcript for a completed streaming session can also be obtained via the Get Transcript endpoint of the Asynchronous Speech-to-Text API when using the JSON response schema. The availability of this transcript is subject to the normal deletion control rules
attention
See examples of the sequence of messages between a client and the Streaming Speech-to-Text API.