Skip to content Skip to main navigation Skip to footer

Search: essential

13 results

Speaker Diarization (DIAR)

Speaker Diarization labels segments of the same voice(s) in one mono-channel audio record based by the individual speaker´s voice. It is a language-, domain- and channel-independent technology. It performs not only the segmentation of speakers but of technical signals and silence as well. The outputs of the technology can be both log files with labels and/or split audio files/one new…

Documentation (VIN)

Partners and customers are encouraged to read the Voice Inspector End User Manual available as VIN-manual.pdf in the application’s installation directory. The manual can also be accessed from within the application by pressing F1, or selecting it in the Menu bar “Help > User guide“. You might be interested in reading the following information in the manual: Introduction Technical Requirements…

Speech Quality Estimation (SQE)

Phonexia’s Speech Quality Estimation quantifies the acoustic quality of recordings. This helps the user to quickly determine whether the acoustic quality of a recording is good for processing with other speech technologies or not. As an answer for SQE, the SPE returns a json/xml file. This file includes general information about the technology and statistics of all (one or two)…

Understand SPE audio converter

…tool, you can upload essentially any audio- or even videofile to SPE and it will be automatically converted to audio format supported natively by SPE. ⓘ NOTE: The automatic conversion is done only when uploading audiofiles to SPE, it’s not done when registering files! For more info about uploading/registering audiofiles, see Understanding SPE home directory article. Converter installation As a…

Time Analysis Extraction (TAE)

Technology description Time Analysis Extraction (TAE) by Phonexia extracts base information from dialogue in a recording, providing essential knowledge about conversation flow. That makes easy to identify: long reaction time crosstalk responses of speakers in both channels speed of speech measured in phonemes per second Typical usage domain It is typically used in contact centers for indicating weak moments in…

Phonexia Ethical Code

Application of the Code It is the policy of Phonexia, s.r.o. (“Phonexia”, “we”) to maintain the highest level of ethical standards in the conduct of our business affairs. Our values guide our actions in all cases. The actions and conduct of our officers, directors and employees (collectively, “Phonexia personnel”), as well as others acting on our behalf, are essential to…

Phonexia Academy

About Main idea of the Phonexia Academy is to help partners to understand the market, Phonexia’s products and technologies. Sell more, deliver your projects on time and at the highest quality, and support your clients effectively. We provide following trainings: Phonexia technologies introduction (online video course) Technical Training Essentials (online video course) Technical Training Advanced – 2 courses: Voice Biometrics…

Q: What are the requirements for SID evaluation dataset?

…recordings in order to meet the criteria of at least 3 recordings for each speaker is not the right way to proceed. This way you are not adding any details. You are essentially analyzing details of a single recording five times. In contrast, by using 5 unique recordings coming from different audio environments or even different times of the day,…

Phonexia End User License Agreement

…foregoing limitations, exclusions and disclaimers shall apply to the maximum extent permitted by applicable law, even if any remedy fails its essential purpose. 15. APPLICABLE LAW. This Agreement shall be governed by, and construed in accordance with, the Civil Code and other laws of the Czech Republic excluding conflicts of laws principles. Should any dispute arising under an agreement fail…

Keyword Spotting (KWS)

…supports global keywordlist-wide threshold and also optional thresholds for individual keywords (if used, threshold set on keyword level overrides the global threshold). Speech Engine (SPE) supports only thresholds on keyword level. Setting the right threshold is essential for getting relevant results and generally greatly increases the accuracy of the technology. However, setting the right threshold can get tricky due to…

Input audio quality

…audio codec, heavy compression, too low bitrate, etc. can damage or even completely destroy essential parts of the audio signal required by speech technologies. Commonly used audio compressions make use of perceptual limitation of human hearing and can remove frequencies which are covered by other frequencies, etc… Therefore, to get satisfactory results from speech technologies, use appropriate audio format. ⓘ…

SPE and Browser installation: embedded SPE

…multimedia converter By default, the Speech Engine will accept only a limited list of audio formats. In order to process the non-native formats, install a multimedia converter. The recommended SW for this is FFmpeg. FFmpeg on Windows Download the latest version from https://www.gyan.dev/ffmpeg/builds/ffmpeg-release-essentials.zip After unzipping the package, move the ffmpeg.exe executable to the /SPE/ directory. You can delete the rest…

SPE and Browser installation: standalone SPE

…Download the latest version from https://www.gyan.dev/ffmpeg/builds/ffmpeg-release-essentials.zip After unzipping the package, move the ffmpeg.exe executable to the /SPE/ directory. You can delete the rest of the contents of the ffmpeg-release-essentials package. They will not be needed. FMMPEG on Linux run the following commands sudo apt update && sudo apt upgrade sudo apt install ffmpeg Open the /SPE/settings/phxspe.properties file with notepad and…