Phonexia Speech Engine v3 (Phonexia Speech Engine (RESTfull API)) is a main executive part of the Phonexia Speech Platform. It is a server application with REpresentational State Transfer (sometime also RESTful) appl... Application Programming Interface interface through which you can access all available speech technologies. Both, Linux 64bit and Windows 64bit operating systems are supported.
Phonexia Speech Engine (SPE3) is adjustable server component which houses all speech technologies. SPE3 provides REpresentational State Transfer (sometime also RESTful) appl... application programming interface to access various technologies. Aside from technologies themselves the Phonexia Speech Engine (RESTfull API) has implemented other various functionality supporting work with speech technologies, recordings and streams, and others.
Main purpose of SPE is to work as processing unit for all Phonexia technologies. However, SPE is not limited to only technologies, but rather provides more functionality.
- entity oriented – when processing any recording or stream with any technology, SPE has information about this particular recording or stream as long as it exists. Once the recording is deleted, or stream is ended, SPE removes all information, metadata and technology results from the database.
- file processing and stream processing (dependent on available technologies). HTTP or RTP streams are supported.
- user management – there are several roles defined with various rights. This enables to let various SPE users work with their data only and prevent them to see any recordings, metadata or technology results of other users.
- load management – SPE is able to queue incoming requests and serve them one by one based on capacity of current installation. This means that user or partner application can request any number of queries and can just wait till all are answered.
- audio management – SPE is able to split stereo recordings to other, cut one audio to several files, save incoming stream and others.
- flexibility in providing results – results are returned in xml/json format. Result can be obtained using several ways – polling, websockets or webhooks.
Speech technologies available
SPE3 provides partners a complete portfolio of speech technologies:
|Speaker Identification (Phonexia Speaker Identification, multiple generations availa...)|
|Language Identification (Phonexia Language Identification, multiple generations avail...)|
|Gender Identification (Phonexia Gender Identification)|
|Age Estimation (Phonexia Age Group Estimation)|
|Voice Activity Detection (Phonexia Voice Activity Detection)|
|Speech Quality Estimation (Phonexia Speech Quality Estimator)|
|Speaker Diarization (Phonexia Speaker Diarization)|
|Speech to Text (Phonexia Speech To Text, sometime also as Speech Transcripti...), 10+ languages available|
|Keyword Spotting (Phonexia Keyword Spotting - acoustics based ASR, several tec...), 10+ languages available|
|Time Analysis (Time Analysis Extraction)|
Generally, you should go through the following steps to run SPE3 as stand-alone server:
- Download package SPE3 according to your platform (Linux/Windows; your Phonexia contact will provide it)
- Download the license file (your Phonexia contact will provide it)
- Unzip the package
- Copy license file into the “SPE3” folder
- Run your command line as administrator
- Run phxadmin.exe in your command line and configure technologies.
- Run phxadmin.exe in your command line and set up name and password for your user account for SPE3.
- Run phxspe.exe in your command line to start SPE
- Now your SPE3 server is running and you can connect to it via IP address and port as set in properties file (./settings/phxspe.properties)
Details for point 6-9, please see ./doc/INSTALL.html (included in the download package).
You might be interested to see API documentation in ./doc/api_reference.html or on our web: https://download.phonexia.com/docs/spe/
Note: Learn more details about Running SPE3 for the first time.
Architecture and components
SPE is application run from command line or as a service. Apart from running main binary file itself SPE requires database, which might be SQLite (delivered inside Phonexia package) or MySQL. No other components are needed.
Structure of Technologies and technology models
From the technical point of view, every technology can work with different technology modules. These are various languages for STT (CS_CZ4, EN_US4), or various sizes for SID (L3, XL3). Technology can work with one module only, or with any number of modules which are installed.
Inside of SPE there is core application together with technologies and technology modules. All the technologies are included in
./bsapi sub-folder. Every technology has separate sub-folder named by technology shortcut, and can include one or more modules. Modules are store down in directory structure.
Documentation and manual
Complete documentation of SPE’s API can be found on Phonexia web pages. Documentation includes
- description of all API endpoints
- manuals for using the HTTP queries and understanding results
- description of mechanics of SPE
- API changes between SPE versions.
Documentation for SPE installation and configuration is included in
Releases and changelogs
You might browse the Releases and Changelogs for Speech Engine v3.
See other posts related to Speech Engine v3.