World & Nation
AP The Wire
Comics & Games
Home & Garden
Advertise with the Times
By DAVE GUSSOW, Times Technology Editor
© St. Petersburg Times, published January 17, 2000
"The rain in Spain stays mainly in the plain."
That bit of doggerel is familiar to anyone who has seen Eliza Doolittle repeat it until it drove the Cockney out of her accent in the musical My Fair Lady.
But here's what appeared on the monitor when we recited the refrain to our computer to test some of the new generation of "voice recognition" software:
Software that converts your speech to type is far better than it used to be. Gone are the days when users had to say a word robotically, pause, then say the next word.
Now, software companies tout "natural" speech -- talk normally and the computer responds. Accuracy has improved, with companies claiming about a 95 percent performance on current versions of their products. More computer functions, from opening programs to surfing the Web, can be done by voice.
But don't throw away the keyboard or toss the mouse. As our experiment with that soggy plain in Spain suggests, it's not a talk-and-go system just yet. It takes time and patience for the software to work as the companies promise, and even then users may not be happy with the results.
Susan Boro-Moyers of Land O'Lakes is an amputee with carpal tunnel syndrome in her remaining arm. She relies on a laptop with a touchpad, mostly abandoning speech recognition software after three frustrating years. She could not do spreadsheets or numbers and never achieved more than 80 percent accuracy.
For people who have no other options, such as paraplegics, it's worth the time and effort to work with speech software, she said. For those with other options, she said, "grab that keyboard and grab the mouse because it gets so frustrating . . . Speech recognition, like most technology offerings today, offers promise that's greater than (what's) delivered."
The high-tech industry is excited about the software's potential, promising that speech recognition will control future technology and make things easier to use.
So far, the public seems to be taking a wait-and-see attitude. The industry had essentially flat sales in 1999, according to market research company PC Data, and it could be years before the technology is more widely accepted.
"The numbers suggest that it's not really a consumer mass market item right now," said Roger Lanctot, PC Data's research director. "We're still in the novelty phase. We're probably a year or two away from it becoming something that's included with almost everything."
Part of it may be getting used to the technology. "There is a real reluctance to speak to a computer," said Bill Scholz, director of engineering at Unisys Corp.'s Natural Language Understanding division in Malvern, Pa. "It will dwindle over time but only as speech applications get better."
Robert Stern, senior adviser for business development at IBM's Almaden (Calif.) Research Center, said, "There is a barrier as far as comfort levels go. People need to adopt to it just like a tape recorder."
For those not willing to wait, talk is neither cheap nor easy. Prices range from $150 to $200, and even being precise doesn't always work, as our test showed.
Following directions helps, says Roger Matus, vice president of marketing for Dragon Systems, especially if users make corrections properly and allow the software to adapt to the user.
"It can do that in an hour or two if you do it consistently with all the errors that can be in a system," Matus said. "The software will not only adjust the individual words but all the words that sound like it."
Even with flashy demonstrations by the companies and in advertising, the tests showed the need for users to take time to learn the software and the software to learn the user.
After installation, each program requires a minimum of 10 to 15 minutes reading from on-screen material. That task sometimes can take hours for the system to learn your voice and speech patterns. Then be prepared for things not to work quite as well at first as you might hope.
The programs share similar characteristics, such as pop-up boxes that allow users to record and correct words the programs initially don't understand. They differ in style and commands. IBM and Dragon offer quick handheld reference cards so users can know what to say at a glance, and all offer some form of on-screen help and tutorials. IBM also has an animated on-screen character called Woodrow to provide tips, but much like that irritating cartoon paper clip in recent versions of Microsoft Word, it becomes more of an annoyance than an aid when errors mount.
The programs go beyond the talk-and-type dictation variety, though they promise to let you dictate at up to 160 words per minute. They also can control some programs such as Microsoft Word and e-mail, and they can help you surf the Web. At least in theory.
Tell IBM's ViaVoice Pro Millennium to "Start Outlook Express," and an on-screen bar responds: "I do not understand the command." Say "Start Program Outlook Express," and it works. I spent almost 25 minutes trying to compose and send an e-mail purely through voice commands -- before giving up. For example, ViaVoice would open the Address Book but wouldn't respond to any command to choose the recipient.
Surfing the Web also was hit or miss. Lernout & Hauspie's Voice Xpress Professional opened Internet Explorer, but wouldn't go beyond the home page. It's on-screen response: "Could not determine the language of this machine."
Philips FreeSpeech 2000 promises hundreds of commands and thousands of alternatives so the program theoretically will understand what a user wants to do, no matter how it's spoken. Not quite.
"Check e-mail" was interpreted as "Check female." Small cartoon-like bubbles on the screen tell a user when a command needed to be repeated. It would open some sites on the browser's favorites list, but not others.
Dragon's tutorial says "practice leads to mastery," and in fact it was the easiest to use right out of the box. Commands are simple ("Scratch that" deletes words), it started Outlook Express promptly, and it successfully maneuvered to send an e-mail (though it, too, understood a command to be "checked female"). Its correction dialog box had problems similar to the others. "Column" came out "talent" three times and "calm" on the fourth, even after using the correction and recording function.
The programs also work with portable digital recorders that later can be connected to the PCs for transcription. With the problems encountered mastering the PC end of it, our tests didn't venture into that area.
Dragon's Matus says most people who buy the software stick with it, particularly if they get to an 80 percent or higher accuracy level. Business users include law enforcement, insurance, doctors and lawyers. But he also says that the company sells a lot of software through America Online, which is mostly a consumer market.
He doesn't consider the competition to be other companies, but rather the keyboard and mouse.
"It's a very crafty competitor," he said. "First of all, it's free. People already know how to use it. If it makes a mistake, they blame themselves, not the keyboard."
He expects speech technology to blossom as handheld devices that can't use keyboards become more common and devices are introduced that allow drivers to control things in cars without their hands. The Internet surfing and desktop control today are a preview of things to come, particularly with digital TV. "It's almost a field test for what the future will bring," he said.
© St. Petersburg Times. All rights reserved.