Skip to content Skip to main navigation Skip to footer

Search: spe-3.11.1-win64

127 results

LID: Terminology and adaptation

…tool which SPE configuration file to use Default Speech Engine configuration file is settings/phxspe.properties. However, when using Phonexia Browser in “SPE on localhost” mode (also known as “Embedded SPE”), the configuration file is settings/phxspe.browser.properties. (Make sure to use the right configuration file, otherwise you might register the language pack to different configuration and therefore it won’t be visible where you…

SID: Speaker Identification: Results Enhancement

…– recordings from different speakers representing the source data, minimum 60 seconds net speech in each. The set must not contain duplicates or target speaker recordings. With FAR Calibration, the system is calibrated to a specific False Acceptance Rate (e.g., FAR = 1%) for each reference voiceprint (speaker model). Only one side (the enroll) is calibrated, using data representing the…

Key Features (PSP)

…The Speech Platform includes the following technologies. Technologies are available in the Speech Engine component based on its particular configuration (Voice Biometrics, Transcription System, etc.) Speaker Identification (SID) – recognizes a speaker automatically based on their voice, Speaker Diarization (DIAR) – separates multiple speakers in mono audio automatically, Language Identification (LID) – detects the language or dialect spoken in a…

Understand SPE benchmark

SPE in the {SPE}/data/benchmark directory. The second option uses single audio file of your choice uploaded to SPE storage, specified by the path parameter. The set of audio files supplied with SPE contains recordings of various length (from 30 seconds to 5 minutes) and with various speech/non-speech ratio. This is to account for the fact that both the length of…

Speech to Text (STT)

…n-grams. Using this the user can adjust a language model focusing on a specific domain to get better results. Result types During the process of transcribing the speech there are always several alternatives for a given speech segment. The technology can provide one or more results. 1-best result type provides only the result with highest score. Speech is returned in…

Understand SPE audio converter

phxspe.exe. FFmpeg: https://ffmpeg.org/download.html SoX: https://sourceforge.net/projects/sox/files/sox/ (The FFmpeg is a bit ‘cleaner’ choice on Windows, since it’s available also as single-executable static build, unlike SoX whose 10+ DLLs clutter up the SPE directory) SPE configuration As a next step it’s necessary to enable and set up the converter in SPE configuration file (in settings/phxspe.properties). Set the audio_converter.enabled to true to enable…

Understand SPE home directory

SPE home directory is an analogy of user home directory in operating systems (e.g. /home/ in *nix, /Users/ in macOS or Windows, etc.) – it is the place where SPE stores data for users configured in SPE. Default SPE home directory location is {SPE_installation_directory}/home/. This location can be changed using server.user.home setting in phxspe.properties SPE configuration file. Changing the home…

Understand SPE connectors for external TTS

…expected to provide information about actual TTS service capabilities: list of voice names, supported languages and audio quality (sampling frequencies). This info is used during SPE startup sequence – TTS connectors enabled in SPE configuration file are started with –info parameter and SPE reads the connector output. Connectors failing to provide the info won’t be available for use with SPE….

Q: I am getting SPE related error after starting the Browser (e.g. SPE server crashed, Error Downloading…, unable to connect to the SPE server, unable to start the localhost…)

…free space in windows explorer and select “open command window here”) Run PhxBrowser software with command: PhxBrowser.exe /spe-debug /spe-output PhxBrowser software will start with “SPE output” tab which shows the debug output of SPE Linux: Run PhxBrowser software in terminal with command: ./PhxBrowser –-spe-debug –-spe-output PhxBrowser software will start with ” SPE output” tab which shows debug output of SPE

Understand SPE user accounts

…prioritization section in the REST API documentation maximum pending requests – legacy REST Server 2.x attribute, ignored in SPE 3.x It’s important to realize that each SPE user account has its own home directory, where SPE stores the account’s data, see Understanding SPE home directory article. It means that by default the accounts’ data is isolated from each other. Therefore,…

Download Speech Platform

…Standalone mode – the recommended setup, requiring some manual steps using command line Further information resources Speech Engine REST API documentation online: https://download.phonexia.com/docs/spe/ offline: {SPE_directory}/doc/api_reference.html or http://{SPE_address:port}/doc Speech Engine technical documentation check the Speech Engine section and the “Understand…” articles listed in the left menu tutorials and training videos see technologies introduction video below and SPE Training videos section https://youtu.be/DDu0Y1rgQ6k…

Releases and Changelogs (VIN)

…Target score distribution Fixed: Population Set selected correctly even if renamed in the selection window Improved: Speech length display in the case view: added “Unlimited” option to display the speech length permanently Improved: SID Evidence score aligned with Speech Engine output of SID score Removed: Speech length compensation Voice Inspector 5.1 Voice Inspector 5.1.0, BSAPI 3.60.0 (2023-12-07) New: A generalized…

Speech Engine update

…concerns the SPE configuration file, especially when updating from rather old SPE version) do the changes – again, a visual comparison tool makes this step much easier! test the updated installation Example of update process for SPE: Stop running SPE Make a backup of SPE (see Understand SPE administration and backup) Optionally(!) delete bsapi directory – this decision is fully…

Measuring of a software processing speed – what is the FtRT (Faster than Real Time)

…in our example is 36 seconds. After stripping silence, it gets 14 seconds – this means that original audio contains 38% of net speech and 62% of silence. Phonexia speech technologies analyze the entire recording, but pick only the speech segments for AI processing, i.e. the absolute processing time will be practically the same… Creating voiceprint by Speaker Identification took:…