Search: stt

43 results

Understand SPE technologies configuration file

…technologies.xml file containing the following setup: STT (Speech To Text) with 8 instances of SK_SK_5 model STT_STREAM (Speech To Text for stream processing) with 2 instances of CS_CZ_6 model SID4E (Speaker Identification 4 Voiceprint Extractor) with 2 instances of L4 model 3 instances of XL4 model SID4C (Speaker Identification 4 Voiceprint Comparator) with 2 instances of L4 model 3 instances…

Releases and Changelogs (Browser)

…Fixed multi-channel recordings might not be processed by STT for the first time Added “Copy text” to context menu of STT widget in Waveform editor Support for SPE 3.5.x Phonexia Browser v3.4.0, BSAPI 3.8.0 – Sep 21 2016 Fixed do not show second label panel in waveform editor when double-click on result Fixed import of the speaker models may get…

Understand SPE workers configuration

…CPU cores in the server. Example: Czech STT on stream is approx. 4 times faster than realtime, i.e. 1 CPU core can process 4 realtime streams simultaneously. So a server with 8 CPU cores running only STT stream can be configured as follows: keep 1 core dedicated for operating system and SPE remaining 7 cores can handle 28 realtime workers…

Q: Can I add words into dictionary?

A: Yes, you can use Language Model Customization (LMC). For more details please read STT Language Model Customization tutorial….

FAQs (PSP)

…Browser, FAQ Speech Platform Permalink Q: What is the difference between on-the-fly and off-line type of speech to text transcription (STT)? A: Similarly as human, the ASR (STT) engine is doing the adaptation to an acoustic channel, environment and speaker. Also the ASR (STT) engine is learning more information about the content during time, that is used to improve recognition….

Understand SPE executable files

…See POST /audiofile endpoint documentation for details. phxclient: example 2 phxclient /login=admin /password=phonexia /method=GET /uri=”127.0.0.1:8600/technologies/stt/?path=/myfile.wav&model=en_us_6&result_type=one_best,n_best&cache_disable=true” ./phxclient –login=admin –password=phonexia –method=GET –uri=”127.0.0.1:8600/technologies/stt/?path=/myfile.wav&model=en_us_6&result_type=one_best,n_best&cache_disable=true” Process myfile.wav file stored in the root of SPE internal storage – e.g. uploaded using the previous example – using the Speech To Text (STT) technology model EN_US_6 (6th generation English), returning one_best and n_best result types, and disabling any…

Q: What languages do you offer?

It depends on the technology. Phonexia Language Identification (LID) is pre-trained for 60+ languages. Phonexia Keyword Spotting (KWS) and Phonexia Speech Transcription (STT) for 20+ languages including English, French, German, Russian, Spanish and many more….

Key Features (PSP)

…recording, Speech to Text (STT) – several languages supported – converts speech into plain text (words or sentences) automatically, Keyword Spotting (KWS) – several languages supported – detects specific keywords/phrases automatically without conversion to text, Gender identification (GID) – identifies whether a speaker is male or female, Age Estimation (AGE) – estimates the speaker´s age group, Voice Activity Detection (VAD)…

Phonexia Speech Engine

Phonexia Speech Engine (SPE) is main part of Phonexia Speech Platform. SPE is a server application for 64-bit Linux or Windows, providing REST API to entire portfolio of Phonexia speech technologies. SPE capabilities overview: Audio files and stream processing Audio files RTP / HTTP streams Speaker Identification (SID) ✓ ✓ Speech To Text (STT) ✓ ✓ Keyword Spotting (KWS) ✓…

Speech Engine update

…up to you, based on the actual content of the directory and your new package NOTE: If you created any user configuration files, or made any changes in configuration files, make sure to keep the respective .bs.usr or .bs files! If you created any customized STT language models using LMC, it’s recommended practice to recreate the STT model using the…

Understand SPE database

…results – file, used technology model, used speaker model, used FAR calibration set, max. FAR, results JSON data rest_result_sid4 SID4 processing results – file, used technology model, used speaker model, used file- and speaker model Audio Source Profile, results JSON data rest_result_sqe SQE processing results – file, used technology model, results JSON data rest_result_stt STT processing results – file, used…

FAQs (Browser)

…details, see KWS technology documentation. in FAQ Phonexia Browser, FAQ Speech Platform Permalink Q: What languages are supported by STT? A: Please see List of supported STT Languages. For more details, see STT technology documentation. in FAQ Phonexia Browser, FAQ Speech Platform Permalink Q: I am getting SPE related error after starting the Browser (e.g. SPE server crashed, Error Downloading…,…

Recommended OS and HW (PSP)

…Intel® Core Processor RAM: 16 GB Storage: 100 GB (depends on your audio retention policy) SSD strongly recommended for superior performance over HDD Configuration includes: STT 6th generation – 2 languages (half load each), KWS 6th generation – 2 languages, LID L4, VAD, SQE Voice Biometrics + Transcription System, basic 100 hours/day package (***) files processing CPU: 14 physical cores,…

Phoneme Recogniser (PHNREC)

…output) Note: The outputs can contain the following special tokens: sil silent part (or no speech detected) The list of phonemes is available in the document phonemes_for_stt_and_kws.pdf (delivered as part of manuals in SPE or STT or KWS). Languages Supported List of supported languages in Phoneme Recogniser is same as in Keyword Spotting. Link to API reference https://download.phonexia.com/docs/spe/#%2Ftechnologies%2Fphnrec…

Understand SPE benchmark

…120.wav ├── 150.wav ├── 180.wav ├── 210.wav ├── 240.wav ├── 270.wav └── 300.wav For majority of technologies, the content of default directory is used for the benchmarking. Benchmarking of the language-specific technologies – STT (Speech To Text) and PHNREC (Phoneme Recognizer) – first tries to find a directory with a name matching the start of the benchmarked model name and…