Skip to content Skip to main navigation Skip to footer

Phonexia Speech Engine

Phonexia Speech Engine (SPE) is main part of Phonexia Speech Platform.
SPE is a server application for 64-bit Linux or Windows, providing REST API to entire portfolio of Phonexia speech technologies.

SPE capabilities overview:

  • Audio files and stream processing
      Audio files   RTP / HTTP streams
    Speaker Identification (SID)  
    Speech To Text (STT)  
    Keyword Spotting (KWS)  
    Voice Activity Detection (VAD)  
    Time Analysis Extraction (TAE)  
    Speech Quality Estimation (SQE)  
    Language Identification (LID)    
    Gender Identification (GID)    
    Age Estimation (AGE)    
    Speaker Diarization (DIAR)    
  • Results caching
    Processing results can be optionally stored in results cache database to speed up eventual re-processing of the same recordings by the same technology – results are then returned immediately from the cache instead of complete re-processing of the audio file.

  • Own persistent data storage
    SPE keeps uploaded audio files in its own persistent storage space, so the original source files can be archived or deleted after upload.

  • Data privacy
    SPE keeps information about audio file or stream only as long as the file or stream exists. Once the recording is deleted from SPE storage, or stream is ended, SPE removes all information, metadata and technology results from the database.

  • Basic user management
    SPE allows to define multiple users with different user roles and user rights. Each SPE user has access only to its own data storage, files, metadata and processing results.

  • Load management
    SPE manages its own queue of incoming REST requests and serves them according to available capacity of current installation. This means that the application layer can request any number of queries and then just wait untill they are processed.

  • Processing priority management
    To allow off-queue high-priority or low-priority processing, SPE also allows to set priority for individual REST requests.

  • Basic audio manipulation
    SPE has built-in basic audio files manipulation functionality, like separating individual channels from stereo recordings, cut one audio to several files, save audio from incoming stream to file and others.

  • Stream audio player
    To support voicebot scenarios, SPE has the ability to play audiofiles directly to output RTP stream

  • External Text-to-speech (TTS) integration
    Easy integration with external TTS providers via simple plugin-like connectors interface

  • Flexible integration
    SPE can provide results in JSON or XML format. Result can be obtained by polling, via websockets, or via webhooks (callbacks).

  • Status information
    SPE can provide various status information to the application layer, e.g. license status, configuration info, current overall load, pending operations status, …

Quick start

The following tutorial describes the first steps with Speech Engine, after obtaining a license file from Phonexia and downloading the Speech Engine package using a link provided by Phonexia.

In short, these are the steps as described in the tutorial:

  1. Unzip the package to a directory
  2. Copy license file into the same directory
  3. Run phxadmin --configure-tech in console to configure technologies
  4. Edit settings/phxspe.properties configuration file to configure server and optionally database
  5. Optionally, run phxadmin --add-user in console to configure user account(s) to access the REST API (or use pre-configured user admin)
  6. Finally, run phxspe in console to start Speech Engine

Now your SPE server is running and you can access the REST API via IP address and port set in properties file (settings/phxspe.properties).

Details for steps 3 to 5 are described in doc/INSTALL.html included in the distribution package.

REST API documentation is in doc/api_reference.html file and also available online at https://download.phonexia.com/docs/spe/.

Speech Engine is actively developed and continually improved – check the SPE changelog for latest news.

Architecture and components

SPE is application run from command line or as a service. Apart from running main binary file itself SPE requires database, which might be SQLite (delivered inside Phonexia package) or MySQL. No other components are needed.

Structure of Technologies and technology models

From the technical point of view, every technology can work with different technology modules. These are various languages for STT (CS_CZ4, EN_US4), or various sizes for SID (L3, XL3). Technology can work with one module only, or with any number of modules which are installed.

Inside of SPE there is core application together with technologies and technology modules. All the technologies are included in ./bsapi sub-folder. Every technology has separate sub-folder named by technology shortcut, and can include one or more modules. Modules are store down in directory structure.