The Web Speech API, The text-to-speech feature refers to the spoken narration of a text displayed on a device. At present, devices such as laptops, tablets, and mobile phones already have this feature. Any application running on these devices, such as a web browser, can make use of it, and extend its functionality. The narration feature can be a suitable aid for an application that displays plentiful text, as it offers the option of listening to website visitors.
The Web Speech API
Initial code & support check
To get started, let’s create a web page with me sample text to be narrated, and three buttons.
The buttons will be the controls for the narration. Now we need to make sure if the UA supports the
window object has the
'speechSynthesis' property, or not.
speechSynthesis is available, first we create a reference for
speechSynthesis that we assign to the
synth variable. We also initiate a flag with the
false value (we’ll see its purpose later in the post), and we create references & click event handlers for the three buttons (Play, Pause, Stop) as well.
When the user clicks one of the buttons, its respective function (
onClickStop()) will be called.
Create the custom functions
Now let’s build the click functions of the three individual buttons that will be called by the event handlers.
When the Play button is clicked, first we check the
flag. If it’s
false, we set it to
true, so if any time the button is clicked later, the code inside the first
if condition won’t execute (not until the flag is
Then we create a new instance of the
SpeechSynthesisUtterance interface that holds information about the speech, like, the text to be read, speech volume, voice spoken in, speed, pitch and language of the speech. We add the article text as parameter of the constructor, and assign it to the
We use the
SpeechSynthesis.getVoices() method to designate a voice for the speech from the voices available in the user’s device. As this method returns an array of all the available voice options in a device, we assign the first available device voice by using the
utterance.voice = synth.getVoices(); statement.
onend property represents an event handler that is executed when the speech is finished. Inside of it, we change the value of the
flagvariable back to false so that the code that starts the speech can be executed when the button is clicked again.
Then we call the
SpeechSynthesis.speak() method in order to start the narration. We also need to check if the narration is paused, for which we use the read-only
SpeechSynthesis.paused property. If the narration is paused, we need to resume the narration on the button click, which we can acheive by using the
Now let’s create the
onClickPause() function in which we first check if the narration is ongoing and not paused. We can test these conditions by making use of the
SpeechSynthesis.speaking and the
SpeechSynthesis.pausedproperties. If both conditions are true, our
onClickPause() function pauses the speech by calling the
onClickStop() function is built similarly to
onClickPause(). If the speech is ongoing, we stop it by calling the
SpeechSynthesis.cancel()method that removes all utterances.
Note that on the cancellation of speech, the
onend event is automatically fired, and we had already added the flag reset code inside of it. However, there’s a bug in the Safari browser that prevents this event from firing, that’s why we resetted the flag in the
onClickStop() function. You don’t have to do it if you don’t want to support Safari.
All latest versions of modern browsers have full or partial support for the speech synthesis API. Webkit browsers don’t play speech from multiple tabs, pausing is buggy (works but buggy), and speech isn’t reset when the user reloads the page in Webkit browsers.