The isSpeaking signal would only be generated when the actual audio playback
started, but this could be several seconds for TTS engines like Mimic which
take some time to generate the audio file for playback. This changes the
creation of the "isSpeaking" signal to the start of the execute() method,
which should queue up audio and leave the signal set until the queue has
eventually been cleared.