During my latest experiments for LocJams (and on some very linear projects), I had the chance to have some fun with MemoQ's speech-to-text tool. Not everything always goes as planned, and the results are sometimes too hilarious to keep to myself.

In this project, you'll find the story as I translated it and then dictated to the MemoQ tool. Hover on the highlighted words to see how they should have been.

 I left Rozaine as Rosen and Nelly as Nelli; I mean, that's as right as MemoQ could get them!

I've also recorded a video to prove that no, I'm not speaking gibberish: it's really the tool doing its own thing and turning proper clean Italian into random nonsense. 👇 Not that interesting, though; basically, it’s an audiobook, but with all the punctuation marks spoken out loud. Yuck!


_________________________________________

Italian has the advantage of being an extremely straightforward language, without too many homophones, so translating by dictating to a voice recognition tool should be a piece of cake, right?

Additionally, I can boast of not having any particular accent and having a fairly clear pronunciation, which gives me even more reason to believe that everything will go smoothly.

And yet, as you can see from this text, all of this still isn't enough to produce a decent text using MemoQ's speech-to-text feature (I admit I’ve never tried DragonDictation and other similar tools, my bad, but hey, this is the official one, should be even better!)



In this translation for the LocJam Crime Story, I won’t dwell on trivialities like inconsistent vowel endings and the like; life’s too short for that.

However, you can still appreciate all the inaccuracies of the dictation tool highlighted in italics, and the most hilarious blunders underlined, those errors that really make you wonder: "MemoQ, maybe you should get a hearing test!"

I decided to group the errors by category and listing/analyzing them in various posts that you can find in the Devlog. Enjoy!

__________________________________________

This endeavor was the result of 8 long hours on a train back from Gamescom, as well as a few hours in a Paris hotel during an overnight stay due to unforeseen circumstances (also called "inefficient rail connections").

There are minimal differences between the final text and the one used in the analysis, mostly because the last revision was done after the audio recording.

__________________________________________

What’s the moral of the story? I don’t think there is one, but the question naturally arises...

In recent years, we've witnessed the huge improvements made by speech-to-text technology, from YouTube videos with almost accurate subtitles despite accents and slang, to tools for summarizing online meetings. These are existing and constantly improving technologies that, I assume, can replicate existing speech patterns and assess the likelihood of a particular syntactic element being more plausible in a certain position in a sentence, or perhaps even maintain some consistency to avoid messing up genders, numbers, or verb tenses.

So: "Why, then, are these technologies being exploited only for profit instead of providing us professionals (especially those with disabilities that might reduce typing agility) with tools that facilitate our daily work?"

🤷🏻‍♀️

_________

Want to read the "proper" translation? It's here!
👉 https://loqace.itch.io/a-san-lupin/devlog/791513/the-proper-manual-translation
Want to check the other project for LocJam Crime Story I participated in?
👉 https://exquisite-gameloc-ita-mob.itch.io/cadaveri-squisitamente-tradotti
Backhround: Image by GarryKillian on Freepik

Development log

View all posts

Leave a comment

Log in with itch.io to leave a comment.