Search: spe

127 results

Releases and Changelogs (SPE)

…Fixed: Different/incorrect output in STT for empty streams (all 6. gen models) Fixed: Time Analysis Extraction does not return correct total length of audio Fixed: Time Analysis Extraction returns crosstalks even for channel that is reported to not contain speech Deprecated: Legacy Speaker Identification technology, i.e. all enpdoints under /technologies/speakerid Speech Engine 3.60 (Public release) Speech Engine 3.60.1, DB v1901,…

Understand SPE configuration file

…is loaded during SPE startup, i.e. you need to restart SPE to apply any changes made in the file. (If for any reason you don’t run phxadmin after SPE installation, you can create the default configuration manually by copying data/phxspe.properties.template file to settings/phxspe.properties file.) NOTE: Phonexia Browser creates configuration file named phxspe.browser.properties if it’s configured to use Speech Engine in…

Understand SPE directory structure

…database SQL scripts. data ├── phxspe.properties.default ├── init.d-phxspe.template ├── phxspe.service.template │ ├── benchmark └── database phxspe.properties.default Default phxspe.properties SPE configuration file init.d-phxspe.template Example SPE init.d script phxspe.service.template Example SPE systemd service unit file benchmark Default audio files for built-in benchmark functionality database Database SQL scripts for supported databases: SQLite, MariaDB and MySQL The phxspe.properties.default file is used by phxadmin tool…

Understand SPE database

…by SPE users: rest_model_sid list of SID speaker models – name, owner (SPE user), modification timestamp rest_model_sid_sources list of files used as sources for SID speaker models creation rest_model_sid_metafiles list of files used as SID speaker models metafiles rest_group_sid list of SID speaker groups – name, owner (SPE user) rest_group_sid_models associations between SID speaker groups and speaker models rest_voiceprint SID…

SPE and Browser installation: standalone SPE

…Keyword Spotting Stream [disabled] 8) Language Identification LanguagePrint Comparator [disabled] 9) Language Identification LanguagePrint Extractor [disabled] 10) Speaker Identification 4 VoicePrint Extractor [disabled] 11) Speaker Identification 4 VoicePrint Comparator [disabled] 12) Speaker Identification 4 VoicePrint Calibration [disabled] 13) Speaker Identification 4 VoicePrint Stream Extractor [disabled] 14) Speaker Identification 4 VoicePrint Stream Comparator [disabled] 15) Speech Quality Estimation [disabled] 16) Speech…

Speaker Identification (SID)

…Identification is the case when we are asking “Whose voice is this?”, such as in fake emergency calls. Usually this entails one-to-many (1:n) or many-to-many (n:n) comparisons. Speaker Search is the case when we are asking “Where is this voice speaking?”, i.e. when looking for a speaker inside a large archive. We have to do with Speaker Spotting when we…

Understand SPE configuration

Basic configuration SPE configuration by default is defined in the file {spe_root}/settings/phxspe.properties. This file is automatically created after running the phxadmin utility for the first time, usually used for initial configuration of the system. Otherwise it can be found in {spe_root}/data/phxspe.properties.template from which the actual configuration file may be populated and renamed manually. The format of the configuration file is…

Understand SPE executable files

This article explains the purpose and usage of executables distributed in SPE package: phxspe, phxclient, phxadmin and phxadmin2. phxspe phxspe is the main SPE executable, launching this file starts the SPE itself. Command line parameters supported by phxspe are listed below: (use appropriate OS-specific parameter separator, e.g. use –help in Linux and /help in Windows) Generic help – Show help…

Understand SPE technologies configuration file

…SQE_STREAM Speech Quality Estimation Stream STT Speech To Text STT_STREAM Speech To Text Stream TAE Time Analysis Extraction TAE_STREAM Time Analysis Extraction Stream VAD Voice Activity Detection VAD_STREAM Voice Activity Detection Stream SIDC Speaker Identification Voiceprint Comparator (legacy) SIDC_STREAM Speaker Identification Voiceprint Stream Comparator (legacy) SIDCALIBSET Speaker Identification VoicePrint Calibration (legacy) SIDCALIBSET_STREAM Speaker Identification VoicePrint Stream Calibration (legacy) SIDE Speaker…

Understand SPE database scripts

…database server, and/or SPE has only limited access rights to the database for security reasons. Therefore we provide the SQL scripts and leave the SPE/database updates completely on the SPE administrators. Scripts for SPE database initial setup and maintenance create_schema.sql – script for creation of DB schema required by SPE in a freshly created empty DB init_data.sql – script for…

Phonexia Speech Engine

Phonexia Speech Engine (SPE) is main part of Phonexia Speech Platform. SPE is a server application for 64-bit Linux or Windows, providing REST API to entire portfolio of Phonexia speech technologies. SPE capabilities overview: Audio files and stream processing Audio files RTP / HTTP streams Speaker Identification (SID) ✓ ✓ Speech To Text (STT) ✓ ✓ Keyword Spotting (KWS) ✓…

SID: Speaker Identification: Results Enhancement

…– recordings from different speakers representing the source data, minimum 60 seconds net speech in each. The set must not contain duplicates or target speaker recordings. With FAR Calibration, the system is calibrated to a specific False Acceptance Rate (e.g., FAR = 1%) for each reference voiceprint (speaker model). Only one side (the enroll) is calibrated, using data representing the…

Understand SPE benchmark

…SPE in the {SPE}/data/benchmark directory. The second option uses single audio file of your choice uploaded to SPE storage, specified by the path parameter. The set of audio files supplied with SPE contains recordings of various length (from 30 seconds to 5 minutes) and with various speech/non-speech ratio. This is to account for the fact that both the length of…

Speech to Text (STT)

…n-grams. Using this the user can adjust a language model focusing on a specific domain to get better results. Result types During the process of transcribing the speech there are always several alternatives for a given speech segment. The technology can provide one or more results. 1-best result type provides only the result with highest score. Speech is returned in…

Understand SPE audio converter

…phxspe.exe. FFmpeg: https://ffmpeg.org/download.html SoX: https://sourceforge.net/projects/sox/files/sox/ (The FFmpeg is a bit ‘cleaner’ choice on Windows, since it’s available also as single-executable static build, unlike SoX whose 10+ DLLs clutter up the SPE directory) SPE configuration As a next step it’s necessary to enable and set up the converter in SPE configuration file (in settings/phxspe.properties). Set the audio_converter.enabled to true to enable…