Skip to content Skip to main navigation Skip to footer

Search: speech to text manual

17 results

Releases and Changelogs (SPE)

…recordings Fixed: Unable to initialize technologies when SPE is launched using UNC path on Windows Speech Engine 3.57 (Public release) Speech Engine 3.57.0, DB v1901, BSAPI 3.57.0 (2023-02-01) New: AGE XL4 and XL5 models (for compatibility with SID4 XL4 and XL5 voiceprints) Speech Engine 3.56 (Public release) Speech Engine 3.56.0, DB v1901, BSAPI 3.56.0 (2022-12-15) New: GID XL5 model (for…

Release Notes

Speech Platform. We improved and released mainly Phonexia Speech Engine v3.50 (SPE, REST API). Major Changes: New Features and Fixes Speech Engine: Speech to Text (STT) We have several exciting new features relevant to STT and KWS technologies. Both technologies are part of the Speech Engine (SPE) component: Spanish (General) Model Released (Tech. Model Name: ES_6) It is an upgrade…

FAQs (PSP)

…Browser, FAQ Speech Platform Permalink Q: What is the difference between on-the-fly and off-line type of speech to text transcription (STT)? A: Similarly as human, the ASR (STT) engine is doing the adaptation to an acoustic channel, environment and speaker. Also the ASR (STT) engine is learning more information about the content during time, that is used to improve recognition….

STT: Language Model Customization tutorial

Language Model Customization tool (LMC) provides a way to improve the Speech To Text performance by creating customized language model. Language model is an important part of Phonexia Speech To Text. In a simplified way it can be imagined as a large dictionary with multiple statistics. The Speech To Text technology uses this dictionary and statistical model to convert audio…

SPE and Browser installation: standalone SPE

…Quality Estimation Stream [disabled] 17) Speech To Text [disabled] 18) Speech To Text Input Stream [disabled] 19) Time Analysis [disabled] 20) Time Analysis Stream [disabled] 21) Voice Activity Detection [disabled] 22) Voice Activity Detector Stream Technology [disabled] 23) Enable all 24) Disable all 0) Quit Choose technology to configure [0]:23 Select the option to Enable all technologies (usually the option…

STT: What is Preferred Phrases feature and how to use it

…e.g. “WiFi” vs. “HiFi”, “cell” vs. “sell”, “eighty machines” vs. “eight tea-machines” etc. Usually, the language model part of the Speech To Text does its job and prefers the correct word in the context of longer phrase or entire sentence: × I’m going to cell my car. Hmmm, such sentence does not sound like common English… √ I’m going to…

Understand SPE configuration file

In this article we explain details of the Speech Engine configuration file phxspe.properties, located in settings subdirectory in SPE installation location. Settings in this configuration file affect the Speech Engine behavior and performance. The configuration file is usually created after SPE installation – on first use of phxadmin, default configuration file phxspe.properties is created in the settings directory. The file…

Understand SPE database

Speech Engine is used together with Phonexia Browser in so-called “embedded” mode (see details about “embedded SPE” mode in Browser manual), Phonexia Browser creates its own separate SPE configuration file and the SQLite database file is located in SPE home directory and named phxserver.sqlite. This might be important in certain scenarios, e.g. when registering LID language pack using phxadmin –…

Download Speech Platform

…only English models for Speech To Text and Keyword Spotting. Additional supported languages are available upon request. ⓘ Click to show/hide the package content Speech Engine – technologies included: Speech To Text (STT) – model EN_US_6 (US English) Keyword Spotting (KWS) – model EN_US_6 (US English) Phoneme Recognizer (PHNREC) – model EN_US_6 (US English) Speaker Identification 4 (SID4) – model…

Understand SPE directory structure

…for individual models settings BSAPI configuration files (*.bs) and optionally manually created user configs (*.bs.usr) There is one exception – LID – which has additional two directories containing pre-built languageprint archives (*.lpa) and language packs: lprints and models. Schemes below show examples of directories for GID (Gender Identification), STT (Speech To Text) and LID (Language Identification): – GID and LID…

FAQs (Browser)

…Browser. in FAQ Phonexia Browser, FAQ Speech Platform Permalink Q: What languages do you offer? It depends on the technology. Phonexia Language Identification (LID) is pre-trained for 60+ languages. Phonexia Keyword Spotting (KWS) and Phonexia Speech Transcription (STT) for 20+ languages including English, French, German, Russian, Spanish and many more. in FAQ Phonexia Browser, FAQ Speech Platform Permalink Q: What…

Phoneme Recogniser (PHNREC)

…user can add to language model of speech-to-text technology (better accuracy of KWS technology). Input audio file (format details – see Speech Engine documentation); stream not supported, technology model name (i.e. language code) to be used for phoneme transcription. Output In the process of transcribing speech-to-phonemes, the Phoneme Recogniser usually identifies individual speech segments and convert it to pronunciation. Example…

Arabic dialects in Phonexia LID and STT

TEXT (used for STT language model training) MSA is used in all formal writing such as official correspondence, literature, newspapers, webpages so there is no problem to accumulate loads of texts, but it will be more formal and far from spontaneous speech Support for MSA in Phonexia products Name LID L4 STT Description Arabic (MSA) arb — Modern Standard Arabic,…

Understand SPE configuration

text-based, well commented and human readable. Read carefully these comments as there are some useful tips and tricks hidden inside. Let’s begin; pay attention to the comment about variables notation format mentioned in the configuration preamble: # This is the default properties file for Phonexia Speech Engine # # Variables: # ${application.dir} path to application directory # ${system.env.<NAME>} system environment…