Search: speech engine

63 results

Releases and Changelogs (VIN)

…Target score distribution Fixed: Population Set selected correctly even if renamed in the selection window Improved: Speech length display in the case view: added “Unlimited” option to display the speech length permanently Improved: SID Evidence score aligned with Speech Engine output of SID score Removed: Speech length compensation Voice Inspector 5.1 Voice Inspector 5.1.0, BSAPI 3.60.0 (2023-12-07) New: A generalized…

STT: Configuring word detection parameters for stream transcription

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part…

Waveform Denoiser (DENOISER)

…software cannot remove unwanted speech or music in the background. Denoiser is used to remove noise from the recording and at the same time to amplify the speech signal for: Better intelligibility when listening by people (recommended use), Achieving better results with automatic speech recognition technologies (necessary to test on customer data first). Input: audio file (format details – see…

SID: Speaker Identification: Results Enhancement

Speaker Identification (SID) Results Enhancement is a process that adjusts the score threshold for detecting/rejecting speakers by removing the effect of speech length and audio quality. This is achieved by use of Audio Source Profiles, that represent as closely as possible the source of the speech recording (device, acoustic channel, distance from microphone, language, gender, etc.). Although the out-of-the-box system…

SID4 performance on Intel® Xeon® Platinum 8124M

…line [FTRT based on net speech] demonstrates system performance based only on “net speech”. In other words, it shows the situation when 100% of the recordings’ duration contains speech (or utterance). This metric is an exact engineering approach, it doesn’t exactly reflect real world. Orange bar, CPU core, shows how many physical cores are available on tested system Blue bar,…

Licensing (technical details)

…profile can be created either using a separate hw-gen tool downloadable below, or by running Phonexia Speech Engine with a hwgen parameter (see more details below). The profile is saved into a text file – usually named hw-info.txt – as a hash. Any major change in the HW profile – including OS version change, e.g. after upgrade or patch! –…

Understand SPE executable files

…– Transfer data from REST SERVER v2 to Speech Engine v3. Requires also v2-properties and v2-cwd parameters. v2-properties=<file> – Path to REST SERVER v2 bsapirest.properties configuration file v2-cwd=<path> – Path to REST SERVER v2 working directory. Usually a path to REST SERVER v2 bin directory. phxadmin2 phxadmin2 is automation- and scripting-friendly command line based SPE administration utility for user management,…

Support Lifecycle Policy (PSP)

General Lifecycle of Phonexia products is driven by Phonexia Product Support and Lifecycle Policy (valid from Q3/2019). Content of our support and software versioning approach is defined as well in this document. Specific versions of our products and languages are supported and maintained according to following tables. Phonexia Speech Engine Version Release Date End of Support Maintained Until Release type…

Time Analysis Extraction (TAE)

…dialogue. This can be used to improve calls between operators and callers or to indicate potential stress points in phone calls, for example, change of speech speed during the conversation). Input TAE can process both audio files and streams (for format details see Speech Engine documentation). By its nature, TAE is usable mainly on two channel phone calls recordings, where…

Understand SPE benchmark

…SPE in the {SPE}/data/benchmark directory. The second option uses single audio file of your choice uploaded to SPE storage, specified by the path parameter. The set of audio files supplied with SPE contains recordings of various length (from 30 seconds to 5 minutes) and with various speech/non-speech ratio. This is to account for the fact that both the length of…

Support

Support is available 5 business days a week (Monday – Friday) / 8 business hours (09:00 – 17:00 CET) in English language. If you have issue with Speech Engine, please include a report in the ticket, to help the support staff to resolve your issue faster: Go to the Speech Engine installation directory Open command line/terminal (in Ubuntu Linux Right…

Understand SPE home directory

…uploading file using POST /audiofile physically creates the file on filesystem in the storage location… and the file stays there until it’s explicitly deleted using DELETE /audiofile. There might be various reasons to NOT use the REST API for uploading files to the Speech Engine, e.g. to save the server from unwanted burden caused by many uploads and/or big files……

Manuals

This section collects links or locations of manuals for specific Phonexia Speech Platform components. API Phonexia Speech Engine REST API – SPE – latest version manual online (api_reference.html for your version is located in doc subdirectory in SPE folder or distribution ZIP) Brno Speech Application Interface v3 – BSAPI3 – latest version manual online Applications and Tools Phonexia Browser –…

Understand SPE technologies configuration file

…SQE_STREAM Speech Quality Estimation Stream STT Speech To Text STT_STREAM Speech To Text Stream TAE Time Analysis Extraction TAE_STREAM Time Analysis Extraction Stream VAD Voice Activity Detection VAD_STREAM Voice Activity Detection Stream SIDC Speaker Identification Voiceprint Comparator (legacy) SIDC_STREAM Speaker Identification Voiceprint Stream Comparator (legacy) SIDCALIBSET Speaker Identification VoicePrint Calibration (legacy) SIDCALIBSET_STREAM Speaker Identification VoicePrint Stream Calibration (legacy) SIDE Speaker…

STT: How to properly convert Confusion Network results to One-best

Confusion Network output is the most detailed Speech Engine STT output as it provides multiple word alternatives for individual timeslots of processed speech signal. Therefore many applications want use it as the main source of speech transcription and perform eventual conversion to less verbose output formats internally. This article provides the recommended way to do the conversion. Time slots and…