Skip to content Skip to main navigation Skip to footer

Search: n-best

40 results

STT: Results explained

n-best output (since version 3.30) The n-best results are updated after each segment/sentence, i.e. they are only available in output when end-of-segment boundary (</segment> token) is encountered in the one-best output. Examples Examples of new generation and legacy file processing Speech To Text outputs: … { “channel_id” : 0, “score” : 0, “confidence” : 0, “start” : 0, “end” :…

Speech to Text (STT)

…the word correctly and when technology evaluates the best result as not matching to what was really said. Confusion network result type provides similar output as n-best, only with the exception that segments are returned word by word. Usage of confusion network is the same as of n-best. Training of new models To create new model of STT about 100…

Releases and Changelogs (SPE)

…Different/incorrect output for empty streams (fixed only in CS_CZ_6 and SK_SK_6 for now) Fixed: STT: Space is not allowed as separator in wordlist file in CLI interface Fixed: STT: Still incorrect timestamp values in N-best output of stream transcription Fixed: STT: Extra “+” character shown in Confusion Network output Fixed: STT: A “+” character gets removed from wordlist backup file…

STT: What is Words-To-Numbers feature and how to use it

…point zero three ⇒ 1586.03 sixty four million seven hundred thousand ninety ⇒ 64700090 This should help to simplify processing of the transcribed texts by text analytic layers or NLP (Natural Language Processing) engines, e.g. in voicebot applications. Where is the converted output available? The words to numbers conversion is available only in n-best output (i.e. where the entire sentence…

STT: How to properly convert Confusion Network results to One-best

…word alternatives: The recommended algorithm for converting Confusion Network (CN) to One-best is as follows: loop through all CN timeslots from start to end in each timeslot, get the input alternative with highest score and if it’s not <null/> or _DELETE_ add the input alternative at the end of your output then, loop through all alternatives in your output for…

Understand SPE executable files

…See POST /audiofile endpoint documentation for details. phxclient: example 2 phxclient /login=admin /password=phonexia /method=GET /uri=”127.0.0.1:8600/technologies/stt/?path=/myfile.wav&model=en_us_6&result_type=one_best,n_best&cache_disable=true” ./phxclient –login=admin –password=phonexia –method=GET –uri=”127.0.0.1:8600/technologies/stt/?path=/myfile.wav&model=en_us_6&result_type=one_best,n_best&cache_disable=true” Process myfile.wav file stored in the root of SPE internal storage – e.g. uploaded using the previous example – using the Speech To Text (STT) technology model EN_US_6 (6th generation English), returning one_best and n_best result types, and disabling any…

Sizing of the computing units for speech technologies

Best practices for good sizing of Phonexia technologies depend on a few facts: Intense work with large data sets requires good performance and bandwidth between RAM and CPU. It all depends on the size of the files with technological models data, usually loaded into RAM and used intensively for computing operations Always think only about physical cores of CPU (HT,…

Understand SPE configuration

…good CPU utilization Virtual machines should be configured after careful consideration about performance RAM speed is more important than CPU clock frequency Because large amounts of data (statistical models) are loaded to RAM, the pipe bandwidth between RAM and CPU is IMPORTANT L3 cache (shared between CPU cores) is a key player. TIP: For best CPU utilization on a single…

Designing and Developing Application

…measure evaluation results and how to process calibration? Etc. We encourage Partner to become familiar also with the following points: Phonexia Speech Engine features and list of the technologies Best practices -typical processing flows and architecture from our previous projects Databases schema Other Phonexia components and tools as example application that can give you inspiration Licensing possibilities of the Phonexia…

STT: Language Model Customization tutorial

…copy of the word list file, as a backup) – see below for the best location for usage in Speech Engine Using customized STT model in Speech Engine STT To use customized STT model in Speech Engine STT, it’s necessary to place the customized model in correct location, so that Speech Engine can find it register and enable the customized…

Arabic dialects in Phonexia LID and STT

…for each – North Levantine (apc) and South Levantine (ajp). Our models were trained using data from both varieties, therefore we followed RFC 5646, section 2.2.4 and created custom language code ar-XL, where the XL means “cross-Levantine” 😉 NOTE: To get the best STT results, use the model that corresponds to given dialect. The AR_XL_* model is best suitable for…

Licensing (technical details)

…via internal network only (as opposed to standard NET licenses which communicate with the Phonexia licensing server via Internet). This way, FLS provides “the best of both worlds” – the flexibility of NET license, while keeping the ability to work in isolated environments without Internet connection. FLS requirements Phonexia FLS is lightweight command line utility, requiring 64-bit Linux OS (no…