A speech synthesizer is a computerized device that accepts input, interprets data, and produces audible language.
It is capable of translating any text, predefined input, or controlled nonverbal body movement into audible speech. Such inputs may include text from a computer document, coordinated action such as keystrokes on a computer keyboard, simple action such as directional interpretation of a joystick, or basic functions such as eye, head, or foot movement.
According to a study by the American Speech and Hearing Association, approximately 1.5 million people in the United States are unable to communicate through vocal language; this number does not include hearing impaired. A speech synthesizer can provide an electronic means of verbal communication for individuals who are unable to speak or have visual impairments. Since spoken language is the primary means of communication in most societies, it is often essential for people who are unable to speak on their own to capture that ability.
Individuals with motor neuron disease (MND) often lose their ability to speak due to weakened vocal cords. MND is a classification for disorders that cause muscle weakness and wasting such as amyotrophic lateral sclerosis (ALS), progressive bulbar palsy (PBP), primary lateral sclerosis (PLS), and progressive muscular atrophy (PMA). In patients with cerebral palsy, the area of the brain controlling vocal muscles is damaged resulting in speech loss.
Speech synthesizers can also be useful for people who are visually impaired. Although they may be able to produce oral speech, they are unable to read or produce written text in a non-Braille format. In the example of a student who is visually impaired, the ability to take notes during a lecture and to then review those notes later is not possible. However, with a speech synthesizer, the student can type lecture notes into a laptop and have a text-to-speech software program read them back for review and revision. Without this technology, the more time-consuming method of transcribing audio-recorded lectures into Braille is used.
There are many considerations involved in selecting a method for speech synthesis. Key factors are the type of technology used, costs, and equipment. Technology can be overpriced or can quickly become obsolete. When considering the purchase of a speech synthesizer, it is important to determine the reliability of the manufacturer as well as policies regarding maintenance and upgrades of equipment or software. The most cost-effective tools are a laptop computer equipped with appropriate software and hardware. Unfortunately, many insurance companies will not cover the purchase of speech synthesizers or related assistive communication devices.
There are many technologies involved in the production of speech with speech synthesizers. The two most definitive segments are how the user inputs information to be spoken and how the sounds for the words are actually interpreted and produced.
The first step to produce the speech is the composition of text to be spoken. In some cases, it is as simple as loading a computer text file into a software program. In other cases, a more complicated input system is required.
There are many different input devices, but the most prevalent is a keyboard or other similar typing board (such as a touchscreen). Patients with severe mobility restrictions may instead use a joystick device. Special input devices are created that act as switches. These switches are programmed to accept and decipher the motions of the user, even blinking of the eyes. Essentially any muscular movement can be interpreted as a switch and programmed to produce language.
The second step is deciphering the input and producing the desired audio speech. Data is gathered or assembled through the input device until the user indicates that the information is complete. The computer then interprets and speaks the words, phrases, or sentences. Complicated logic is involved when translating written text into spoken
Depending on the device, multiple shortcuts may be available to the user. Examples include:
- storing phrases or sentences to reuse at a later time
- translation of abbreviations such as ASAP, which can also be programmed to speak the full phrase, i.e., "as soon as possible"
- software programs that "guess" what the user wants to say and predicts the output as input is gathered; if correct, the user can acknowledge the completion, thereby speeding up the entry of data
Even with the advanced technology available for speech synthesizers, a bottleneck of information often occurs with the input. A typical spoken conversation takes place at a rate of 150–200 words per minute. While some individuals can become proficient at touch-typing, allowing for greater success with interactive conversations, many individuals are challenged to produce even 15 words per minute with communication devices.
The typical setup for individuals who use a computer or touchscreen includes a computer, keyboard, monitor, and speakers. In many cases, this equipment can be attached to a wheelchair or bed frame, allowing the user access to "speech" at any time. Other users may simply carry a laptop, batteries, and the necessary connection cables.
For those users unable to manipulate a computer or keyboard-style input device, there is a period of learning and acclimation required to become accustomed to the switch-style inputs. The user must learn how to complete the step-by-step process of composing thoughts into text for output.
A major challenge for individuals who are visually impaired is the presence of graphics in text. Because graphics typically lack a textual equivalent, they are not recognized and spoken by the synthesizer. This may cause the user to miss some information on the screen.
Once an individual has selected a speech synthesis device, there is little follow-up necessary. Hardware and software updates frequently evolve and so there is potential to upgrade devices periodically. Depending on the underlying cause of speech loss, some patients may need to change devices as they lose or regain the ability to speak or move.
Through a speech synthesizer, non-vocal users can communicate with spoken words and people who are visually impaired can hear written text. The challenge of becoming proficient with these devices may be greater for some individuals based on physical restrictions.
Holmes, John, and Wendy Holmes. Speech Synthesis and Recognition, 2nd Edition. New York: Taylor & Francis, 2002.
Pausch, Randy, and Ronald D. Williams. "Giving CANDY to Children: User-Tailored Gesture Input Driving an Articulator-Based Speech Synthesizer." Communications of the ACM 35 n5 (May 1992): 58–67.
Sasso, Len. "Voices from the Machine." Electronic Musician February 1, 2004.
Maxey, H. David. "Smithsonian Speech Synthesis History Project." National Museum of Natural History, Smithsonian Institute. July 1, 2002 (cited March 23, 2003 [June 3, 2004]). <http://www.mindspring.com/~ssshp/ssshp_cd/dk_779.htm#V>.
Olshan, Michael. "Voice Lessons: Speaking with ALS." American Speech-Language-Hearing Association. 2004 (cited March 23,2004 [June 3, 2004]). <http://www.asha.org/public/speech/disorders/als-voice-lessons-speaking-with-als.htm>.
"Speech Synthesis." Wikipedia. March 23, 2004 (cited March 26, 2004 [June 3, 2004]). <http://en.wikipedia.org/wiki/Speech_synthesis>.
"What is MND?" Motor Neuron Diesease Association. March 26, 2004 (cited March 26, 2004 [June 3, 2004]). <http://www.mndassociation.org/full-site/what/index.htm>.
American Speech-Language-Hearing Association. 10801 Rockville Pike, Bethesda, MD 20852. (800) 638-8255. email@example.com. <http://www.asha.org>.
Motor Neuron Disease Association. P.O. Box 246, Northampton NN1 2PR, United Kingdom. 01604 250505; Fax: 01604
Stacey L. Chamberlin