Please fix Georgian STT

Hi, there are a few STT platforms that have Georgian in the language list. So, I was happy to see that you do have support for my language.

Unfortunately, your STT does not work properly for Georgian.
Please see example here
https://transcript.lol/read/youtube/@khanacademykartuli/6601453ad2f24aea73b1f72b?view=transcript

The first issue is the script. It wrote most of the transcript in Latin, like transliteration. Where it did use Georgian, it's just a jumble of letters (not like proper words without spaces).

But, looking through the Latin text it looks like it did an ok job. Here, on this screenshot you can see the main issues.


These are that it breaks apart words where there is no need for it ( the + signs) and combines some words (the - sign). Also, misses some sounds (the added letters), or mishears (the crossed over letters with additions).

Could you please fix these?

If you need some help in crowd-sourcing/ labeling the correct & incorrect output to train the AI, I can share it with my community and we will contribute free of charge. If you gift some Credits to the contributors more will join, but there will be a few, including me, that would contributors for free.

Our community volunteers for development of Georgian AI by crowd-sourcing data. Our goal now is to help develop Georgian STTs. We contribute to the Common Voice open source dataset. Added 200+ hours the dataset and continue growing it.
https://commonvoice.mozilla.org/en/datasets

Please authenticate to join the conversation.

Upvoters
Status

In Review

Board

Feedback

Date

About 2 months ago

Author

rba

Subscribe to post

Get notified by email when there are changes.