Skip to content Skip to main navigation Skip to footer

Search: speech to text

46 results

Testing possibilities

…(GSM, VoIP,…) Microphone placement (close-field vs. far-field) Audio quality Formats Codecs Background noise Geological locations Age distribution Style of speech Monolog vs. dialog Reading a text vs. live conversation In some of the scenarios mentioned above, it is quite difficult to assure all of these requirements, that is the reason why the best option for accuracy testing is definitely in…

What is User configuration file and how to use it

…working state. User configuration files provide a way to override processing parameters without modifying original BSAPI configuration files. WARNING: Inappropriate configuration changes may cause serious issues! Make sure you really know what you are doing. User configuration file is a plain text file with the same name as main configuration file, with additional extension .usr. For example: Main configuration file…

Q: How to fix Error 1007: Unsupported audio format?

…ffmpeg utility, powerful and well documented. Please find your distribution package at http://ffmpeg.org Then continue as described below: Using Phonexia Browser with embed SPE Open the Browser configuration dialog by click on button “Settings” located in tool ribbon. Select tab “Speech Engine” and configure SPE as described in documentation. Don’t forget select checkbox “Enable audio converter”. Using SPE as service/daemon…

Understand SPE home directory

…uploading file using POST /audiofile physically creates the file on filesystem in the storage location… and the file stays there until it’s explicitly deleted using DELETE /audiofile. There might be various reasons to NOT use the REST API for uploading files to the Speech Engine, e.g. to save the server from unwanted burden caused by many uploads and/or big files……

Licensing (technical details)

…all speech technologies and products and may be required in order to use utilities and tools developed by Phonexia or partners. For technical purposes, the License agreement is represented by the license file, which describes the Phonexia technologies or products allowed to be used with that license file. License file The license file is a plain text file named license.dat…

Understand SPE database

…by server.identifier or server.logging.database.identifier configuration settings (see SPE configuration file explained for details) ProcessId numeric PID of the process which created the log record Thread identifier of thread which created the log record ThreadId numeric ID of thread which created the log record Priority priority of the operation which created the log record Text raw log text as it would…

Understand SPE metafiles

Certain SPE entities – SID Speaker models, SID Audio source profiles, LID Language packs – can have additional information associated with them in the form of “metafiles”. This article explains the intended usage of metafiles. In general, SPE is intended as under-the-hood engine, focusing purely on the speech-related audio processing. Any additional functionality should be done on the application layer,…

Download Semantic Search demo

…following Readme.md is part of the docker image too. Semantic search Phonexia’s text-embedding based semantic search project demo Prerequisites installed Docker 25 GB of free HDD space 8 GB of RAM basic experience with using Docker images Running in Docker First you’ll need to load CPU/GPU image from provided tar file(s). docker load –input phonexia_semantic_search_<cpu/cuda>_<hash>.tar.gz Assuming that you have all…

Phonexia technology models EoL

Speech to Text (STT) and Keyword Spotting (KWS) models Languages supported by Speech To Text and Keyword Spotting Standard = Maintained until newer generation is released, or end of support is reached. Language generation is specified by the number in “Model name”. Language (region) Model name Released End of support Maintenance Arabic (Gulf, Kuwait) AR_KW_6 2022-04 8th gen. Standard Arabic…

Understand SPE executable files

…(in octal format, e.g. 027) pidfile=<path> – Write the application’s process ID (PID) to the specified file Windows-specific registerService – Register the application as Windows service displayName=<text> – Specify service friendly name (valid only with registerService) description=<text> – Specify service description (valid only with registerService) startup=automatic|manual – Specify service startup mode (valid only with registerService) unregisterService – Unregister the previously

Understand SPE configuration

text-based, well commented and human readable. Read carefully these comments as there are some useful tips and tricks hidden inside. Let’s begin; pay attention to the comment about variables notation format mentioned in the configuration preamble: # This is the default properties file for Phonexia Speech Engine # # Variables: # ${application.dir} path to application directory # ${system.env.<NAME>} system environment…

Arabic dialects in Phonexia LID and STT

TEXT (used for STT language model training) MSA is used in all formal writing such as official correspondence, literature, newspapers, webpages so there is no problem to accumulate loads of texts, but it will be more formal and far from spontaneous speech Support for MSA in Phonexia products Name LID L4 STT Description Arabic (MSA) arb — Modern Standard Arabic,…

Understand SPE connectors for external TTS

…from stdin is as follows: { “text“: string, “voice”: { “name”: string, “languageCode”: string } } Where: text is the text to be synthesized name is a voice name to be used for synthesis (ref. to the voice names provided in the connector “info” data) languageCode is a language code defining the language to be used for synthesis (ref. to…