A Docker image with a Speech-to-Text webapp for Kazakh

Usage

download the kazakh-stt-cpu.7zfile that we are providing
extract it (you might need to install 7-zip)
open a terminal
in your terminal, run: docker load < kazakh-stt-cpu.tar
run: docker run -dp 8000:8000 taruen/kazakh-stt-cpu:latest
open a browser and go to the following page: http://localhost:8000/servlets/standalone.rkt

The screenshot attached shows how that page should look like.

Limitations

This is a CPU-only image, and on a computer without a special graphics processing unit (GPU) the process of transcribing is not fast. E.g. on a Lenovo Thinkpad T440p laptop (Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz, 12GB of RAM) it takes about 6 mins to recognize 30 seconds of speech.

GPU-enabled images will be provided later, so stay tuned.

Acknowledgements

Our product uses ISSAI Kazakh Speech Corpus https://doi.org/10.48342/gkg9-gn84, which is available under a Creative Commons Attribution 4.0 International License.

Source code / building blocks

Espnet, Kaldi, ISSAI_SAIDA_Kazakh_ASR .

Name a fair price:

I want this!

A Docker image with a Speech-to-Text app for Kazakh using which you can convert .wav files with Kazakh speech into text

input files

.wav

output

plain text in browser

supported languages

Kazakh

does support GPUs?

no, this container is CPU only

30-day money back guarantee