Skip to content Skip to main navigation Skip to footer

Phonexia Speech Engine

Phonexia Speech Engine (SPE) is a server application, providing REST API to entire portfolio of Phonexia speech technologies.

SPE capabilities overview:

  • Audio files and stream processing
      Audio files   RTP / HTTP streams
    Speaker Identification (SID)  
    Speech To Text (STT)  
    Keyword Spotting (KWS)  
    Voice Activity Detection (VAD)  
    Time Analysis Extraction (TAE)  
    Speech Quality Estimation (SQE)  
    Language Identification (LID)    
    Gender Identification (GID)    
    Age Estimation (AGE)    
    Speaker Diarization (DIAR)    
  • Results caching
    Processing results can be optionally stored in results cache database to speed up eventual re-processing of the same recordings by the same technology – results are then returned immediately from the cache instead of complete re-processing of the audio file.
  • Own persistent data storage
    SPE keeps uploaded audio files in its own persistent storage space, so the original source files can be archived or deleted after upload.
  • Data privacy
    SPE keeps information about audio file or stream only as long as the file or stream exists. Once the recording is deleted from SPE storage, or stream is ended, SPE removes all information, metadata and technology results from the database.
  • Basic user management
    SPE allows to define multiple users with different user roles and user rights. Each SPE user has access only to its own data storage, files, metadata and processing results.
  • Load management
    SPE manages its own queue of incoming REST requests and serves them according to available capacity of current installation. This means that the application layer can request any number of queries and then just wait until they are processed.
  • Processing priority management
    To allow off-queue high-priority or low-priority processing, SPE also allows to set priority for individual REST requests.
  • Basic audio manipulation
    SPE has built-in basic audio files manipulation functionality, like separating individual channels from stereo recordings, cut one audio to several files, save audio from incoming stream to file and others.
  • Stream audio player
    To support voicebot scenarios, SPE has the ability to play audiofiles directly to output RTP stream
  • External Text-to-speech (TTS) integration
    Easy integration with external TTS providers via simple plugin-like connectors interface
  • Flexible integration
    SPE can provide results in JSON or XML format. Result can be obtained by polling, via websockets, or via webhooks (callbacks).
  • Status information
    SPE can provide various status information to the application layer, e.g. license status, configuration info, current overall load, pending operations status, …
  • Technologies Available (13 Articles)

    The Speech Engine can include one technology or combination of the following technologies (depends on configuration).

    Main technologies providing high value:

    Basic technologies to help reaching accurate results:

    • Gender Identification (GID)
    • Age Estimation (AGE)
    • Voice Activity Detection (VAD)
    • Speech Quality Estimator (SQE)
    • Time Analysis Extraction (TAE)
    • Phoneme Recognizer (PHNREC) – multiple languages supported
    • Waveform Denoiser (DENOISER)

23 Articles

Understand SPE configuration

Basic explanation of configuration directives for SPE with hints & tips. Overview of phxspe.properties for beginners.