Skip to content Skip to main navigation Skip to footer

Search: channel

29 results

FAQs (PSP)

…Browser, FAQ Speech Platform Permalink Q: What is the difference between on-the-fly and off-line type of speech to text transcription (STT)? A: Similarly as human, the ASR (STT) engine is doing the adaptation to an acoustic channel, environment and speaker. Also the ASR (STT) engine is learning more information about the content during time, that is used to improve recognition….

FAQs (Voice Verify)

…accepted. There is currently no mechanism to detect which channel in stereo or multi-channel recording contains the voice of the desired speaker. For that reason, the admin of Voice Verify must ensure that recordings used for voiceprint creation are mono and contain the voice of the desired speaker only. in FAQ Voice Verify Permalink Q: What are the audio/stream quality…

Understand SPE audio converter

SPE directly supports limited list of audio formats (codecs and containers), see Supported audio formats FAQ. Other audio formats must be converted using external tools. This conversion can be done either completely outside of SPE, before passing the files to SPE, or you can set up SPE to convert the files automatically. Then, depending on the capabilities of the conversion…

Speaker Identification

…in a recording are also unique, thus the technology can be language-, accent-, text-, and channel-independent. How does Speaker Identification work? Automatic speaker recognition systems extract the features from a voice to a voiceprint. A voiceprint is a small numerical representation of the voice, capturing the most unique characteristics of a speaker’s voice. The whole voice verification process consists of…

Phonexia Speech Engine

…audio manipulation SPE has built-in basic audio files manipulation functionality, like separating individual channels from stereo recordings, cut one audio to several files, save audio from incoming stream to file and others. Stream audio player To support voicebot scenarios, SPE has the ability to play audiofiles directly to output RTP stream External Text-to-speech (TTS) integration Easy integration with external TTS…

Q: What are the requirements for SID evaluation dataset?

…in each recording (i.e. usually 2+ minutes recording length) only one speaker in each recording wide variety of gender and age is recommended recordings should be as similar to the target use case as possible (device, channel, distance from mic, languages distribution) audio files should be mono, lin16 format, 8 kHz+ sample rate *Note: splitting single recording into multiple shorter…

Gender Identification (GID)

Gender Identification is a language-, domain- and channel-independent technology that uses the acoustic characteristics of the recording to determine the gender of the speaker in question. This technology is able to distinguish between two genders: Male (M) and Female (F). Minimum of speech signal for identification: 7+ sec recommended (with XL4 and L4 model (9+ sec for previous generation of…

Voice Activity Detection (VAD)

Voice Activity Detection is a language-, domain- and channel-independent technology that identifies parts of audio recordings with speech content vs. non-speech content. It creates labels for speech and other signals in the recording; this can then serve as a decision point whether to process the recording by other technologies or not. VAD is usually part of rapid filtration process in…

Multi-server deployment

…components in the Client infrastructure may change. When there is any change in any component providing a channel connection from the Customer to Phonexia Voice Verify, it can affect the accuracy of the system. In case any component affects the channel changes, Phonexia recommends creating a new evaluation set, making an evaluation (by Phonexia) and utilizing a new calibration profile…

Releases and Changelogs (Browser)

…path to temporary directory contains certain accented characters Fixed: Licensing errors not visible before exiting application Phonexia Browser 3.18.0, BSAPI 3.22.0 (2019-10-03) New: Waveform editor can now process stereo file by Diarization in per-channel mode New: Added Gender balance and Score sharpness in Settings -> Scoring New: Multiple columns in Result pane can be turned on/off at once using context…

Orbis 1.1.0 Release Notes

…case number, suspects etc. Maximal speech length for the voiceprint extraction To optimize performance, we used a constraint on the total length of speech captured during voiceprint extraction. This allows the Orbis system to perform more extractions per minute, especially for the longer audio recordings. Only one channel processing To optimize performance, the option of processing only one channel (out…