Dictation > Apple's dictation

cxo · May 23, 2024, 7:25am

Folks, call me lazy, but I love to dictate text on my iPhone and Mac.

Call me a man with slurred pronunciation and fast speech - I play in the Champions League when it comes to ADHD. So when dictating, it’s important that the software understands me.

The dictation function on my Mac (M1 Air) and my iPhone (13 Pro) is super integrated. It can be activated anywhere, in any program, at the touch of a button and writes where you want to write, without copying and pasting. The only problem is that it constantly misunderstands texts, invents absurd words, corrects correctly spelled words, breaks off in between, produces completely strange errors, learns absolutely nothing and is completely immature - but it’s integrated really well. The result is that the time saved is negated by the subsequent extensive manual corrections.

Some time ago, I discovered Whisper, the speech recognition software from OpenAI. It is available as part of ChatGPT and in the form of many apps. And it works really well for me with English and my native German.

However, nothing I’ve found on this basis is as nicely integrated as Apple’s crappy dictation function.

Do you know of anything suitable?

suavito · May 23, 2024, 9:51am

I’d say let’s wait a bit for the next iteration of the OSes. WWDC starts on the 10th of June, and we will find out what we can get our hands on in late summer or autumn.

Over the past few months, Apple representatives have been talking (or rather, hinting) about an upcoming AI integration and how well equipped the new devices are with their neural engines, etc. If they really mean it, this must include much better speech-to-text recognition.

Even in German. Which is also my native language, and I can assure you: It’s not your pronunciation and how fast you speak. It’s the crappy algorithms.

I use the Apple Watch a lot for dictation. And when I’m out and about using mobile data, there’s often a noticeable delay between what happens on the device and what happens off it: First I see what I’ve dictated, a moment later the dictated text that’s been sent to Apple’s servers comes back—but verschlimmbessert, as we say in German (“improworsened”, something that’s meant to improve something but actually worsens it).

AmberV · May 23, 2024, 3:52pm

Yes, while dictation is not a new technology, advances made with it in the past five years or so have been notable, and within the past couple of years, even more so than that. We’re in the middle of both speech recognition and text to speech taking a big leap forward, and I would suspect that will end up in operating systems soon.

cxo · May 24, 2024, 10:57am

As we are talking about AI-based speech recognition… integrating an API to a writing assistant like DeepL Write might be useful, too. In my job, I use AI modules like that very often and speed up tedious jobs quite efficiently. It’s much better than typo checkers like in Word.