Understand SPE connectors for external TTS
SPE can be easily connected with external Text-To-Speech (TTS) services using simple connector system. This article describes the principles and how-tos; following this instructions you can create your own connector, allowing to use a custom 3rd party TTS service via SPE.
The TTS connector should be a command line (CLI) application or script, which communicates with the external TTS service via the service native API and with SPE via standard input (stdin) and output (stdout).
The connector behavior should be as follows:
- if connector is started with
--info
parameter, it outputs TTS service capabilities information data in JSON format to stdout - if connector is started without parameter
- reads input JSON data from stdin
- outputs raw PCM signed 16-bit little-endian mono audio data to stdout
- SPE 3.46+: with sampling frequency according to
naturalSampleRateHertz
value returned in capabilities - SPE up to 3.45: with fixed sampling frequency 8000 Hz
- SPE 3.46+: with sampling frequency according to
- (optional) if started with
--help
or-h
parameter, connector outputs basic usage text to stdout
Details of the connector behavior are listed below.
TTS service capabilities information
Launching the connector with --info
parameter is expected to provide information about actual TTS service capabilities: list of voice names, supported languages and audio quality (sampling frequencies).
This info is used
- during SPE startup sequence – TTS connectors enabled in SPE configuration file are started with
--info
parameter and SPE reads the connector output. Connectors failing to provide the info won’t be available for use with SPE. - when the
/external/technologies/tts/info
endpoint is called – all successfully initialized TTS connectors (see above) are asked to provide the capabilities information. This is intended to refresh the info from the TTS service.
NOTE: The capabilities info data (voice names, language codes, sampling frequencies) should be obtained from the actual TTS service. Returning just some hardcoded info instead of propagating real capabilities of the TTS service is not a good idea as it might potentially get incorrect over the time, leading to obscure issues in the application relying on the info.
Required capabilities information JSON structure:
{ "apiVersion": 2, "vendor": string, "author": string, "version": string, "voices": [ { "name": string, "languageCodes": [string, string, ...], "naturalSampleRateHertz": number }, . . . ] }
{ "vendor": string, "author": string, "version": string, "voices": [ { "name": string, "languageCodes": [string, string, ...] }, . . . ] }
Where:
apiVersion
denotes version of the capabilities structure/API:2
: SPE 3.46 and newerapiVersion
property not present at all for SPE 3.45 and older
vendor
is a name of the TTS provider
This name is then used in thePOST /external/technologies/tts
parameterauthor
andversion
are intended for internal connector author description and versioningvoices
array should list available TTS voices- voice
name
- list of
languageCodes
supported by that voice - SPE 3.46 and newer only:
naturalSampleRateHertz
, providing default natural sampling rate of the audio
- voice
Connector input
The input JSON which should be accepted by the TTS connector from stdin is as follows:
{ "text": string, "voice": { "name": string, "languageCode": string } }
Where:
text
is the text to be synthesizedname
is a voice name to be used for synthesis (ref. to the voice names provided in the connector “info” data)languageCode
is a language code defining the language to be used for synthesis (ref. to the connector “info” data)
Connector is responsible for passing the input data to the actual TTS service as needed using the service native API, retrieving the synthesized audio data from the TTS service and outputting the audio to the stdout (see the Connector output section below).
TIP:
The connector can be used even for playing ‘static’ messages from audio files – e.g. the text
property can be used for passing the file name to be played… and the audio files can be organized in directories whose names are passed to the connector using the voice name
property… or something similar.
Connector output
Output obtained from TTS service should be written by the connector to stdout as raw PCM signed 16-bit little-endian mono audio data.
In SPE 3.46 and newer, the audio sampling frequency must be set to the naturalSampleRateHertz
value provided in the TTS service capabilities information.
In SPE 3.45 and older, the audio sampling frequency must be fixed to 8000 Hz.
SPE then reads the audio and writes it either to a file, or to an output realtime stream, according to the original request – see Text To Speech section of REST API documentation.
SPE reads the connector output continuously, i.e. connector can stream the audio data to the stdout as soon as it’s received from the TTS service (if the service supports streaming of the synthesized audio). This can reduce unwanted delays, especially in case of longer texts (taking longer time to synthesize).
Connector naming, location, configuration
TTS connectors should be placed in {SPE_installation_directory}/external/technologies/tts
directory, each connector in a separate subdirectory.
To enable a connector, include its subdirectory name to the external.technologies.tts_connectors
setting in SPE configuration file.
Connector executable file must be named connector
(i.e. without file extension).
Connector configuration – like TTS service address, access credentials, API token, etc. – should be ideally done using separate configuration file, preferrably named connector.properties
using .properties-like format (to be consistent with SPE configuration file format).
If all is set and configured properly, SPE should log a successful TTS connector initialization:
TTSSubsystem: Retrieving external connector info from ......./external/technologies/tts/acapela TTSSubsystem: External connector 'acapela' from ......./external/technologies/tts/acapela has been registered.
If an error occurs, SPE logs the problem:
TTSSubsystem: Retrieving external connector info from ......./external/technologies/tts/acapela TTSSubsystem: Cannot retrieve external connector info! ERROR: Loading configuration from "......./external/technologies/tts/acapela/connector.properties";Error: acapela server is not running or address and ports are misconfigured;