Search: multi

41 results

SPE and Browser installation: embedded SPE

…multimedia converter By default, the Speech Engine will accept only a limited list of audio formats. In order to process the non-native formats, install a multimedia converter. The recommended SW for this is FFmpeg. FFmpeg on Windows Download the latest version from https://www.gyan.dev/ffmpeg/builds/ffmpeg-release-essentials.zip After unzipping the package, move the ffmpeg.exe executable to the /SPE/ directory. You can delete the rest…

SPE and Browser installation: standalone SPE

…LID technology, simply increase the number 1 to for ex. 5: <name>DIAR</name> <models> <item> <name>XL4</name> <n_instances>1</n_instances> <config_file/> </item> </models> 5. Optional: configure the multimedia converter By default, the Speech Engine will accept only a limited list of audio formats. In order to process the non-native formats, install a multimedia converter. The recommended SW for this is FFmpeg. FFMPEG on Windows…

FAQs (PSP)

…container Other audio formats must be converted to one of those natively supported using external tools. SPE server can be configured do this conversion automatically in background, see Understand SPE audio converter article. Great tools for converting other than supported formats to supported are FFmpeg (http://www.ffmpeg.org) or SoX (http://sox.sourceforge.net/). Both are multiplatform software tools for Microsoft Windows, Linux and Apple…

Phonexia Speech Engine

…or stream exists. Once the recording is deleted from SPE storage, or stream is ended, SPE removes all information, metadata and technology results from the database. Basic user management SPE allows to define multiple users with different user roles and user rights. Each SPE user has access only to its own data storage, files, metadata and processing results. Load management…

Q: How can I add new language to LID?

A: There are multiple methods to train a new language, please see article in Components > Speech Technologies > LID….

Sizing of the computing units for speech technologies

…cores = 64 GB Conclusion: The best computing performance can be expected from a CPU with: l3_cache_size/#_of_physical_CPU_cores=>2.5 MB Memory bandwidth & speed is more important than CPU base frequency. Intel fixes on TLB due to Meltdown and Spectre issues matters in performance. Important notice (valid for SPE3) – due to internal SPE3 requirements you must multiple the required number of…

Licensing (technical details)

…and so on. In essence, FLS works similarly to floating licenses known from other software products – original HW- or USB-locked license (see above) is NET-ified by the license server, so that it can be shared by multiple various client machines (which would not be possible with the original HW/USB license). Clients using this license then communicate with the FLS…

Speaker Identification (SID)

…Smith!”. This approach of one-to-one (1:1) verification is also employed in Voice-As-a-Password systems, which can add further security to multi-factor authentication over the telephone. Large-scale automatic speaker identification is also successfully used by law enforcement agencies during investigation for the purposes of database searches and ranking of suspects. In later stages of a case, Forensic Voice Analysis uses smaller amounts…

Gender Identification (GID)

…generation of XL3 and L3 models) Output scoring: log-likelihood ratio (LLR) and score (0-1). Score can be interpreted as percentage by multiplying the score by 100. Typical use cases: filtering calls by gender, playing advertisement focused on specific gender, getting quick demographic analysis of the recordings. The speed of Gender Identification is up to 150 FtRT (depending on the model)….

Speaker Diarization (DIAR)

…silence as well. The outputs of the technology can be both log files with labels and/or split audio files/one new multichannel audio file. Typical use cases: Preprocessing for other speech recognition technologies, labeling the parts of the utterance according to the speakers, splitting telephone conversations recorded in mono into several channels, identifying how many speakers are speaking in the recording….

Q: What are the supported audio formats?

…configured do this conversion automatically in background, see Understand SPE audio converter article. Great tools for converting other than supported formats to supported are FFmpeg (http://www.ffmpeg.org) or SoX (http://sox.sourceforge.net/). Both are multiplatform software tools for Microsoft Windows, Linux and Apple OS X. Example of usage: FFmpeg ffmpeg -i <source_audio_file_name> <output_audio_base_name>.wav This command converts any supported format/codec audio file to normalized…

KWS: Results explained

…the detected pronunciation. Start- and end time is in HTK units. 1 HTK unit is 100 nanoseconds, so dividing the times by 10000 gives the amount of milliseconds. Score is log likelihood ratio from {-inf,+inf} interval. Confidence is a probability from {0,1} interval. To convert it to percentage, multiply the confidence value by 100. Example This example of Keyword Spotting…

SID4 performance on Intel® Xeon® Platinum 8124M

…times for each number of used cores (physical and virtual) Collected data are saved in CSV file FTRT numbers are calculated as median from collected measurements. Total system performance is simple multiplication of computed FTRT equivalent. Measuring of a software processing speed – what is the FtRT (Faster than Real Time) Understanding of the methodology: At the beginning, our…

STT: How to properly convert Confusion Network results to One-best

Confusion Network output is the most detailed Speech Engine STT output as it provides multiple word alternatives for individual timeslots of processed speech signal. Therefore many applications want use it as the main source of speech transcription and perform eventual conversion to less verbose output formats internally. This article provides the recommended way to do the conversion. Time slots and…

Phonexia Partner Program for Government Partners

…the Starter Kit during the onboarding period? Yes, the Starter Kit can be purchased anytime during our cooperation. Can I purchase the Starter Kit multiple times? Yes, for each project, proof of concept, and product line, you can purchase a Starter Kit again. Phonexia consultants can’t wait to support your business. How do you deliver technical training? Phonexia technical training…