Search: cti audio stream

77 results

Speaker Identification (SID)

…are monitoring a large number of audio recordings or streams and we are looking for the occurrence of a specific speaker(s). Speaker spotting can be deployed for the purpose of Fraud Alert. Speaker Verification is the case when we are asking “Is this Peter Smith’s voice?”, such as when a person calls the bank and says, “Hello, this is Peter…

Understand SPE technologies, instances and workers

…(or bank branch): Post office is a place providing different kinds of services – one can go there to send letters, send or pick up packages, get a POBox, get some financial services, insurance, etc.). Speech Engine has various speech technologies configured – one can analyze the audio quality, extract voiceprints from recordings, compare voiceprints, transcribe audio to text, etc….

STT: What is Preferred Phrases feature and how to use it

…it can help in other applications, too – e.g. when transcribing domain-specific audios, the frequently used domain-specific phrases can be boosted. How preferred phrases work The picture below shows a simplified standard speech transcription process – the digitized speech signal spectrum is analyzed in the neural network acoustic model (which describes the pronunciations of a given language) and goes into…

Keyword Spotting (KWS)

…to reveal (or “transcribe”) pronunciation directly from actual audio recording. Phoneme Recognizer Phoneme Recognizer (PHNREC) reveals the phoneme transcription of a specified audio recording, or its part. This can be used to get the actual pronunciation of a keyword or phrase as is actually spoken in the audio recording. This pronunciation can be then used in a keyword list for…

What is User configuration file and how to use it

Advanced users with appropriate knowledge (gained e.g. by taking the Phonexia Academy Advanced Training) may want to finetune behavior of the technologies to adapt to the nature of their audio data. Modifying original BSAPI configuration files directly can be dangerous – inappropriate changes may cause unpredicatble behavior and without having a backup of the unmodified file it’s difficult to restore…

Releases and Changelogs (Browser)

…Improved: Audio length in the Results pane longer than 24 hours is now shown in hours instead of days Improved: Browser can now load transcription files that contain spaces in a word instead of ‘+’ signs Fixed: Wrong file suffix when saving transcription on Windows Phonexia Browser 3.59 (Public release) Phonexia Browser 3.59.0, BSAPI 3.59.0 (2023-06-20) New: Transcription can be…

Understand SPE directory structure

…advanced configurations. bsapi ├── age ├── denoiser ├── diar ├── gid ├── kws ├── lid ├── sid4 ├── sqe ├── stt ├── tae └── vad Each individual technology directory contains typically three main subdirectories: data Technology data, in separate directories for individual technological- or language specific models example Audio files for quick testing, in some cases also in separate directories…

Waveform Denoiser (DENOISER)

…Speech Engine documentation); stream not supported, technology model name to be used for processing. Output: audio file (WAV or RAW), together with xml/json report (in SPE only). Fig.: Comparison of original recording (david_noisy.wav, top half of image) and same recording processed by Denoiser (david_denoised.wav, bottom half of the image). Typical Questions Q: What do you recommend for deploying this technology?…

Q: Do the language-prints (LPs) extracted from audio sources depend on the currently available language pack?

A: The language-prints do not depend on the current language pack used. You may use them for both training a new language pack and testing/comparing against an existing language pack. The language-prints need to be compatible only with the model of LID used for language-print extraction….

Understand SPE user accounts

…data/ └── storage/ └── audio -> /shared_recordings/ In the above example, we created a directory audio in each SPE account’s storage, and symlinked it to a completely different directory shared_recordings. If any of the SPE accounts uploads a file to the audio directory, the file will be accessible by the other SPE accounts. NOTE: When using such trickery, all “the…

Minor Issue

Any scenario that does not fall under the Critical or Severe Issue definitions above. The Product is still operable but contains Issues occurring in a minority of audio files or audio streams or are of a minor nature….

Major Issue

An Issue that renders the Product partially functional, the use of which in a production environment is substantially reduced. The Issue contains an error that impairs the ability of the system to process a majority of audio files or audio streams, or that renders the setup and maintenance of the system inoperable….

Language Identification (LID)

…Routing particular calls (languages) to human operators (language experts) Scoring and results The LID language pack defines a set of recognizable languages (represented by a language models). When identifying the language in audio recording (or languageprint), LID does the following: creates languageprint of the recording (if the input is audio recording) compares that languageprint with each language model in a…

Q: What to do with the ApplicationStartup: Unhandled exception: BsapiException error?

…range (0,91840). A: It means that this opus file is created improperly and declares internally (in header) much more audio than available in real file. Please check your audio source/originator for proper functionality. Or use ffmpeg / sox utility as preprocessor of the audio and do audio normalization by self-conversion from opus to opus before recordings are processed through SPE….

Q: What are the recommendations for LID adaptation set?

…pack 20+ hours of audio for each new language model (or 25+ hours of audio containing 80% of speech) Only 1 language per record For adapting the existing language model (discriminative training) 10+ hours of audio for each language May be done on customer site. May be done in Phonexia using anonymized data (= language-prints extracted from a .wav audio)…