Hmmm, on Mac, might be possible to simplify using system's dictation (have no idea if they have API for that, probably no, but maybe there is a way to hack it) and system's voice. So then it might be even faster, and would require mostly just Wolfram|Alpha API key.
Yes it does SAPI[1]. There's a few for linux but nothing that's become any kind of standard unfortunately, festival tends to have the best voices but is a bit of a pain to use inside another program in my experience.
It originally used festival but it was a pain to install on the Pi (which I want this to run on) and the voices I tried weren't as easy to understand. I'd love for it to be an option, though.