Phonexia Speech Engine

About

Phonexia Speech Engine v3 (SPE3) is a main executive part of the Phonexia Speech Platform. It is a server application with REST API interface through which you can access all available speech technologies. Both, Linux 64bit and Windows 64bit operating systems are supported.

Phonexia Speech Engine (SPE3) is adjustable server component which houses all speech technologies. SPE3 provides RESTfull application programming interface to access various technologies. Aside from technologies themselves the SPE has implemented other various functionality supporting work with speech technologies, recordings and streams, and others.

Features

Main purpose of SPE is to work as processing unit for all Phonexia technologies. However, SPE is not limited to only technologies, but rather provides more functionality.

  • entity oriented – when processing any recording or stream with any technology, SPE has information about this particular recording or stream as long as it exists. Once the recording is deleted, or stream is ended, SPE removes all information, metadata and technology results from the database.
  • file processing and stream processing (dependent on available technologies). HTTP or RTP streams are supported.
  • user management – there are several roles defined with various rights. This enables to let various SPE users work with their data only and prevent them to see any recordings, metadata or technology results of other users.
  • load management – SPE is able to queue incoming requests and serve them one by one based on capacity of current installation. This means that user or partner application can request any number of queries and can just wait till all are answered.
  • audio management – SPE is able to split stereo recordings to other, cut one audio to several files, save incoming stream and others.
  • flexibility in providing results – results are returned in xml/json format. Result can be obtained using several ways – polling, websockets or webhooks.

Speech technologies available

SPE3 provides partners a complete portfolio of speech technologies:

Speaker Identification (SID)
Language Identification (LID)
Gender Identification (GID)
Age Estimation (AGE)
Voice Activity Detection (VAD)
Speech Quality Estimation (SQE)
Speaker Diarization (DIAR)
Speech to Text (STT), 10+ languages available
Keyword Spotting (KWS), 10+ languages available
Time Analysis (TAE)

 

First Steps

Generally, you should go through the following steps to run SPE3 as stand-alone server:

  1. Download package SPE3 according to your platform (Linux/Windows; your Phonexia contact will provide it)
  2. Download the license file (your Phonexia contact will provide it)
  3. Unzip the package
  4. Copy license file into the “SPE3” folder
  5. Run your command line as administrator
  6. Run phxadmin.exe in your command line and configure technologies.
  7. Run phxadmin.exe in your command line and set up name and password for your user account for SPE3.
  8. Run phxspe.exe in your command line to start SPE
  9. Now your SPE3 server is running and you can connect to it via IP address and port as set in properties file (./settings/phxspe.properties)

Details for point 6-9, please see ./doc/INSTALL.html (included in the download package).
You might be interested to see API documentation in ./doc/api_reference.html or on our web: https://download.phonexia.com/docs/spe/

Note: Learn more details about Running SPE3 for the first time.

Architecture and components

SPE is application run from command line or as a service. Apart from running main binary file itself SPE requires database, which might be SQLite (delivered inside Phonexia package) or MySQL. No other components are needed.

Structure of Technologies and technology models

From the technical point of view, every technology can work with different technology modules. These are various languages for STT (CS_CZ4, EN_US4), or various sizes for SID (L3, XL3). Technology can work with one module only, or with any number of modules which are installed.

Inside of SPE there is core application together with technologies and technology modules. All the technologies are included in ./bsapi sub-folder. Every technology has separate sub-folder named by technology shortcut, and can include one or more modules. Modules are store down in directory structure.

Documentation and manual

Complete documentation of SPE’s API can be found on Phonexia web pages. Documentation includes

  • description of all API endpoints
  • manuals for using the HTTP queries and understanding results
  • description of mechanics of SPE
  • API changes between SPE versions.

Documentation for SPE installation and configuration is included in ./doc/INSTALL.html file.

Releases and changelogs

You might browse the Releases and Changelogs for Speech Engine v3.

 

See other posts related to Speech Engine v3.