Subscribe to EDN
RSS
Reprints/License
Print
Email

Technology for disabled users: Lessons learned

Vendors building devices for disabled users have solved many interface challenges for electronic devices. So why aren't these technologies available in the consumer market? Too often, the people asking "Why reinvent the wheel?" are the very people reinventing it.

By Nicholas Cravotta, Technical Editor -- EDN, May 24, 2001

I've been interested for some time in "assistive technologies" (technologies for disabled users). My friend Ben is dyslexic, which in his case means he cannot read without the help of another person. I remember some of the games on my old TRS-80 Model-I computer talking to me. Why, I wondered, can't today's computers read to Ben? Another friend, Jonathan, has an unnamed condition that has left him with little use of his arms and legs and taken his ability to speak. He communicates with his family by clicking his tongue: once for yes, twice for no.

The main drive for this article was to discover whether technologies are available to help my two friends. I'm also driven by a haunting fear that one day I may lose the use of my own eyes or hands and be unable to support my family. In all honesty, the greatest of these fears is that I won't be able to live my life the way I'm used to living it.

Initially, Iintended this article as a survey of the current state of technology for disabled users. What I found was a disturbing dichotomy. Technologies, such as text-to-speech translation, have been developed in two independent markets: consumer/business technology and assistive technology. Little cross-pollination between the two groups exists. On the one hand, the consumer/business industry has resources (R&D dollars) and volumes (spreading R&D costs across millions of devices sold). On the other hand, the assistive-technology industry has experienced designers who understand how to solve certain difficult problems. For example, today's cell-phone users receive and answer e-mails using the mind-boggling and tedious method of typing, using "2" for "a," "b," and "c"; "3" for "d," "e," "f"; and so on. The assistive-technology industry has solved this problem (see sidebar "Typing on a cell phone, pg 70").

The needs of disabled users have exposed problems with and revealed answers to technologies we commonly gripe about and with which we suffer. Visually impaired users have prompted the development of technologies for using complex GUIs without the need for sight or a mouse. Motor impairment has led to input devices and interfaces that can allow a person to "speak" full sentences using a single-button interface and devices that allow truly hands-free computer use. Learning impairment has resulted in symbolic interfaces that resolve many of the problems of language-context ambiguity. This article aims to offer you a glimpse of the gold that the assistive-technology industry has uncovered to speed the cross-pollination process.

Predictive software has been a great boon to disabled users. For example, people use some words more frequently than others. Access to more frequently used words, then, should require fewer keystrokes. Vocab Voice Systems, for example, offers the $600 Vocab+. A virtual keyboard ( Figure 1 ) displays the 26 letters of the alphabet, as well as the 32 most frequently used English words. If you type "t," you see the 32 most frequently used words beginning with "t." Follow "t" with an "h," and you'll see 32 "th" words. You keep adding letters until either the word you want appears or you finish typing it. This type of prediction, the company claims, can reach 85% of the words in a conversational dialogue in two keystrokes (one for the first letter and one for the most frequently used word). Three strokes reach 93% of the words. You can also add words that you use frequently to a nested user-defined page (two keystrokes away). Contrast this process with using an interface that groups functions by topic, resulting in commonly used functions buried several menu layers deep.

One important consideration is whether frequency is static or dynamic. A static list, for example, is based on overall language use, not necessarily one person's usage. Therefore, certain common words may be difficult to reach. A dynamic list, on the other hand, puts at the top of the list the words that a user most frequently types. An advantage is that words a user doesn't use don't clutter the screen. However, dynamic lists have the disadvantage of constantly changing order. A static list holds words in set order, making items easier to find, because you can memorize their positions. In either case, user-defined lists, such as lists that you can access from a top-level menu or grid, allow users to consistently and efficiently access their own most frequently used words. The ability of users to define macros can also help.

As a caveat, it's important to point out the difference between true predictive technology and annoying predictive technology. True predictive technology reflects actual usage. Annoying predictive technology—such as Windows' latest incarnation, in which pull-down menus show only the last few selections chosen—makes inputting or manipulating information more difficult.

Technological oppression

To some degree, desktop computers offer a great platform for communication, but they often limit our options to "input information" and "manipulate information." Windows, for example, although extensible, is oppressive. Just try using a PC without a mouse or without looking at the screen.

Why does the ability to use a mouse or look at the screen matter? Consider voice recognition. If you need a mouse to use an application, then you are tied to your chair, and the advantage of being able to talk to your computer is limited to your freedom from typing. Additionally, if you must be able to see the computer to use it, you are limited in what other activities you can do while you are using voice recognition. Wireless technologies, such as Bluetooth, promise to free you from cables, but the applications themselves, which require the use of a mouse and a screen, will still keep you tied down.

The most oppressive aspect of many interfaces is that they treat everyone the same way. Most interfaces rely almost entirely upon a person's ability to read. However, there are many reasons that a person may be unable to read. For example, visual impairment prevents people from seeing the words; dyslexics cannot, without assistance, bridge the gap between words and meaning; people with limited cognitive ability may never have the ability to understand written words; and many people are simply illiterate. Others of us would like to be able to use our computers without having to stare at a screen all day.

Again, the idea is that you need not necessarily be free of the display or input device to use a computer. However, just how you will be able to better use computers you don't have to sit in front of to use remains to be seen. Consider the difference between using a phone with a cord and using a portable phone. The portable phone allows you to walk to a file cabinet and look up information you need or carry trash out to the garage during a conversation with a friend. The freedom to walk around was unnecessary to continue use of phones as you knew them. However, once you had the freedom, you came up with new ways to take advantage of it.

The success of GUIs in desktop space has prompted many developers of embedded systems to think that GUIs are the way to go. However, it is important to note that GUIs are only one interface scheme and, more important, devices need not limit users to only one method of input. People learn differently, speak differently, and want to do different things. So, why do they all have to use a particular device in the same way? A device that offers creative alternative interfaces can appeal to and better serve more people. The question becomes, how easily can you adjust the thresholds on a device based on your ability to use it?

For example, PDAs use different handwriting-recognition schemes. For each scheme, it takes time to learn how to write a simple note or perform "difficult" tricks, such as shifting from lowercase to uppercase. Therefore, even an experienced user of one PDA can't necessarily just pick up and immediately use another brand of PDA. This situation is the result of companies expecting people to adapt to devices instead of expecting devices to adapt to people. Of course, this situation might be intentional. It discourages users from changing from one PDA to a competitor's device. However, consider that it also prevents the opposite situation.

A helpful feature (if it's possible) would be to expose the letter strokes in such a way that users could define their own strokes for letters. The leading vendor might avoid this feature to keep market share, but players with less market share can attract customers by allowing them to adapt the interface to their preferences or, rather, what they are used to. Customization of a device at this level lets users optimize devices to their personal needs. Customizing a device can be difficult and time-consuming, so not every user will want to do it, but those who do will at least be able to. And consider the flexibility of such customization. A user could create shortcut strokes that represent commonly used words. Of course, the words would be those that the user actually uses—not those an engineer in a lab thinks users want. Finally, flexible interfaces get around the difficult problem of a competitor's suing companies for giving a device a look and feel that's too similar to its device. By supplying the means to appropriately adjusting the interface, the users themselves can give the devices the look and feel of another device without companies fearing legal reprisal.

Help yourself

One barrier to open systems is the shift in electronics to a "services"-based approach, meaning vendors want to keep making money after the sale of electronic equipment by providing additional services or software upgrades. In some ways, this process hinders application development: Vendors don't want fourth-party vendors siphoning from their revenue stream. As a consequence, however, users become unable to develop services for themselves. In the assistive-technology industry, users are tied to vendors for improvements and software advancements. However, given the limited market size of this industry, there's not enough volume to drive more than a few applications. Thus, the community becomes trapped.

The DSL market has similar issues with open and closed technology. Of course, users won't be developing their own applications, but many third- and fourth-party vendors are interested in getting a piece of the DSL-services market. To some degree, an open-systems model reduces the revenue stream of equipment vendors. The question is whether it guts a market or enables it.

For example, in the assistive-technology industry, increasing volumes would mean a drop in overall hardware and software cost. A variety of services would increase the application range of devices to further increase volumes. The issue then becomes how much expertise you need to develop software for the device. As much as I've resisted Windows and C++ in the embedded space, these technologies do lower the level of expertise required to write software, and therefore open the system wide.

The paradox with assistive technologies is that, although they extend the freedom of disabled users, they also bind these same users by not extending that freedom to being able to develop applications for themselves. And this situation affects not only disabled users but also their families and friends, who are often willing to pay whatever it takes to help. The applications simply aren't there to pay for. Fortunately, the tide is changing.

For example, current development of symbol grids is challenging. Symbol grids allow users to select symbols, either to bring up topic pages of more symbols or to select a specific symbol or word. A typical application for symbols is a speaking device; a user selects symbols to create a sentence, and then the box speaks the sentence aloud.

Vendors claim that, although editing cells on a grid is not difficult, developing the layout and philosophy of the grid itself is. This issue comes down to efficiency. Well-crafted grids reflect usage and optimize the time it takes to make common statements. For example, my friend Jonathan selected one click to mean "yes" and two clicks to mean "no" before he understood that he would be saying "no" far more often than "yes." Someone with experience could have caught this inefficiency before the family learned this method and was unwilling or unable to change.

Most symbol devices allow some customization and editing of grids. However, the interfaces are limited, and creating, labeling, and placing symbols require many keystrokes. To add a single symbol, the interface is not onerous. For more complex customization, you need special software. DynaVox, for example, offers the $299 PageWizard for building grids on a PC and then downloading them to the $6995 DynaVox3100, a dedicated, symbol-grid device with a touchscreen. However, even a casual look at DynaVox Windows reveals that it is no easier to use than FrontPage (once you understand what you're doing). And PageWizard creates only static pages; you can't create pages that show topic symbols that link to deeper pages. For that, you have to purchase DynaVox Windows for $995.

Dedicated devices won't disappear overnight. However, many vendors are shifting to general-purpose tablet computers or PDAs with touchscreens running some version of Windows. (Tablet computers are full PCs that can run all the software your desktop can.) For example, on the PC side, is Aqcess Technologies' Qbe Vivo for $1999, and on the Mac side there's the Gemini for $5995. There's also a plethora of PDAs. Many assistive-technology companies buy PDAs, add their own software, and resell the devices as complete products, using existing PDAs such as Compaq's iPAQ Pocket PC or members of the Palm Pilot line. For those who want to build tablet computers or PDAs, a number of companies offer hardware and software reference designs to quickly get you started. For example, Accelent Systems and InHand Electronics both offer reference designs for building compact handheld devices.

Given that no accepted standard exists for "the best symbol grid," designing a grid is still to some degree an arbitrary process. The problem with difficult-to-use or expensive development platforms is that they actually increase the need for experts. Given an open development platform, expertise for designing grids could easily arise as a community expertise. Instead of requiring expert individuals who then have to charge for their time, expert users could share their experience via the Internet. A disabled user could download pages if adequate ones had already been created or create a new topic and upload the pages for others to use. The community becomes a partner of the industry and is empowered to serve itself.

I walked into this project with a theory in mind: Mainstream companies can learn a lot from disabled users, even if these people aren't using the companies' products. However, I almost fell into the hypocrisy of working from the vacuum of my own mind without consulting any disabled users. In my hubris, I thought I understood the field of assistive technologies before I took a serious look. When I began talking with people, I suddenly discovered just how little I understood about these kinds of technologies.

One key lesson I learned throughout this project is that people are not statistics. You can describe the behavior of a group of people using statistics, but it does not follow that the final numbers represent any of the people surveyed. Case in point: No one has 2.3 children. How and why people act the way they do is often mysterious at the individual level. And, as I further researched this article, I began to understand just how little I truly understood about the human-factor issues involved. I would have liked to have interviewed more people and spent more time with organizations that specialize in adapting technologies, but there are limits on time and words. The point is that this industry has many excellent ideas you can tap into and people you can learn from.

Too many companies wait for technology to be ready before they implement it. The reality is that in most cases, the technology in question has been ready for years, and few companies on the commercial side are aware of it. The trick is not to wait around until some other company says the technology is ready but to simply get along with implementing it. For example, the Atari 800 computer, using a program called SAM (Software Automatic Mouth), decades ago supported text-to-speech conversion. I remember typing sentences and listening to the tinny voice speak them aloud. Current R&D efforts have focused on the quality of spoken speech, such as natural pausing and inflection, not just making recognizable speech (see sidebar "Crossing the quality gap"). But the base technology—converting text to speech—hasn't changed much. Yet it is still difficult to make a computer talk. Why?

Not surprisingly, most of the problems with implementing technology involve the interface. The difference between SAM on the Atari 800 and text-to-speech conversion on a PC is that the PC voice sounds a bit more realistic, and the PC has a GUI that's pretty but doesn't offer much value. The algorithms have changed, but the use and implementation of the technology have not. If anything, too many companies have focused on developing the technology itself, reinventing difficult work that has already been done by other people, instead of inventing new ways of using it.

I discovered too many technologies to cover in one article: optical character recognition, video communication, wireless access, screen magnifiers, speech recognition, text-to-speech conversion, even more intriguing input devices, tactile devices, and so on. I discovered a few instances in which the assistive-technology and commercial industries meet, such as in the case of the i.d.mate from En-Vision America, which allows blind people to label pills or boxes and use a bar-code reader to hear what's inside. But what I found most often was a lack of sharing among the industries. General-purpose computers have found their way into few assistive-technology products, and little if any expertise in designing human interfaces to computers has worked its way into the commercial space.

Crt Marincek, chairman of the 2001 European Conference for the Advancement of Assistive Technologies, offers a statement heard so many times from the assistive-technology community: "Persons with special or different needs should not be excluded from advances in technology."

Perhaps we should be asking instead how they can be drivers of technology.

If you found this article useful and want EDN to continue to explore assistive technologies, please drop me a line at ednnick@pacbell.net.

Crossing the quality gap
Perhaps you've been "lucky" enough to watch some of today's cartoon shows with your children. As you watch them, "cheap" is a word that comes to mind; the animation is so rushed you could almost put any words you want into the characters' mouths. However, going the distance to create movement that mimics actual human speech requires substantial effort.
Quality is a key issue for technologies that portray human characteristics, such as TTS (text-to-speech) or 3-D avatars. Making devices seem humanlike and making them humanlike are two different things. Bridging that gap, or increasing the quality of computer expression, requires substantial effort and expense. So why bother?
One reaction people have to TTS conversion is that they think the voices sound like computer voices. If the goal is to make someone think he or she is dealing with another person, not a computer acting like a person, crossing the quality gap is important. When making a customer-service phone call, for example, people have little enough faith that a person will be able to help them, never mind a computer that will run them around in circles.
However, realistic computers offer much more than their ability to trick us. Computers have an amazing quality: They don't mind doing something over and over again or answering endless "stupid" questions. To build on the example of cartoon characters, the Center for Spoken Language Understanding at Oregon Graduate Institute in Portland has been using an animated character named Baldi (see picture) to teach deaf children how to not only read lips, but also speak words for themselves. The researchers based Baldi on the Fluent Animated Speech software, which Sensory currently owns. Accurate expressions and articulations are created using visemes , the visual component of phonemes, which are the smallest individual component of speech. For each viseme, the software uses a nonlinear morphing technique that takes several dozen static pictures and blends them together to create a fluid and accurate representation of a person speaking. Scripting software from other vendors can determine what Baldi will say in an application. Baldi can currently communicate at five syllables per second, the same rate at which humans produce speech. In its next incarnation, Sensory would like to make Baldi transparent, so that students can view the movement of the tongue, teeth, and palate, to better see how to correctly produce a particular syllable or word.
In a similar vein, Vcom3D has created signing avatars, which use sign language to communicate. Currently, the avatars can sign only preset messages; an authoring tool is due out in December. One of the reasons for the delay is that translating natural language to sign language is challenging. Dynamically resolving ambiguities of language, as well as translating spoken colloquial to signed colloquial, requires more than a simple one-to-one mapping scheme.
A key characteristic of a computer persona is that it never gets tired. Avatars repeat phrases as many times as students desire. Note the use of the term "desire" and not "require." It is the student, not the instructor, who gets tired of going over the same point again and again.


Beyond the mouse: pointing devices
Several alternatives to using a mouse, or pointing device, are under development. Perhaps the most obvious is the "head mouse." Maui Innovation Peripherals offers the $179 Cyclopx, an optical device that mounts on your head. The company claims that the Cyclopx can detect movement of 1° of angular motion. To "click," you lean your head forward. Other technologies use cameras to track user motion. The Tracker 2000, which Madentec offers for $1895, uses one camera to track a silver dot stuck between your eyebrows. The dot acts as a reference point as the device follows full head movement. The Quick Glance System from EyeTech Digital Systems sells for $3995 and uses two lights to shine reference points on one eye. A mounted camera then determines the gaze point, or where the eye is looking. You blink to click. Finally, the $4395 VisionKey System from EyeCan requires a special goggle adapter for glasses. The goggles hold a grid of symbols (letters and numbers), which the computer monitor also displays. Users type by holding their gaze on a selection until the device confirms the selection with a highlight and a beep.
Note that each of these systems needs some kind of reference point to track motion. Also note that these systems require a degree of unnatural stability. In other words, when you look at something, you usually move both your eyes and your head. Both the Cyclopx and Tracker assume a locked eye position. The Quick Glance System requires a locked head position. (The mounted camera tracking your eye assumes that your eye stays within a particular field, meaning you must hold your head stable.)
I tried the Quick Glance System and discovered a few interesting points as I played a game of solitaire entirely with my eyes. Surprisingly, holding my head in the same place was not as tiring or restrictive as I anticipated. Over time, as processing power increases on the desktop, the camera will be able to track a wider field and compensate for normal movement. Calibrating the device, however, seemed to be more of a hit-or-miss proposition, as is the case for many technologies when you apply them at the individual level. It's unclear to me whether this issue is a computational, algorithmic, or user problem. The impact of poor calibration is that the cursor sometimes isn't exactly where you want it. Solitaire has a high tolerance for such inaccuracies, given the size of the cards. Using pull-down menus, however, required me to determine whether the cursor was on the wrong item and then make an adjustment with my eyes to correct for the cursor position, only slightly slowing me down.
The use of blinking for clicking creates an interesting system bottleneck. People normally blink, so you don't want the system to treat every blink as a button click. Thus, as unnatural as it feels at first, you have to blink slowly. This slow blink limits the speed at which you can use the system. It illustrates a problem that arises any time you want an input node to represent more than one input value. You have to differentiate between uses of the node, usually through timing. The best illustration of this issue is scanning technology, used for people whose disabilities limit their ability to communicate to using a single button. A delay in pressing translates into a change in the button's meaning.
The most elaborate pointing device I found was the JestPoint system from JesterPoint. JestPoint uses a stereo camera to create a depth map of moving objects in front of it. The demonstration I tried was a soccer game, in which I defended a net from incoming soccer balls. I stood in front of a blue screen, and the soccer net and balls were displayed on a television placed in front of me. II was superimposed on the net, behind the balls. As I swung my arms or kicked, I could hit balls out of the way of the net. If I swung or kicked too soon, I missed the ball. Interestingly, JesterPoint pitches JestPoint as an alternative to touchscreens with the advantage of allowing larger screens, more viewers, and no risk of scratched or dirty screens.
I also tried a version of a "thought mouse," but the technology still seems far out. The device required me to put conductive gel on my head, which consequently got into my hair, and measured changes in brain behavior. In its early stages, the technology resembles voice recognition several decades ago in that the resolution between thoughts is not yet fine. In other words, for the computer to differentiate between two thoughts from a user, the thoughts have to be quite different. As an analogy, early voice-recognition technology could often confuse similar-sounding words, such as "one" and "done." To avoid this problem, an application would use more disparate words, such as "one" and "finished." The thought mouse I tried required eyebrow movement, which struck me as not exactly thought. Even so, I had significant trouble maneuvering through the demo maze; the process seemed just a little less than random.


The fading benefits of closed systems
I challenged several vendors to justify why they continued building fixed-function, closed-system devices for the assistive-technology market. After all, why continue creating dedicated hardware and software when off-the-shelf computers are cheaper and already offer a solid software foundation, including operating system and browser.
One argument that vendors offer is that disabled users require robustness. When a device is so critical to communication, users cannot afford to have it randomly crash. That's one of the chief reasons used to justify dedicated devices written on closed platforms. The problem with Windows is that programs "sometimes" crash. However, computers never do things "sometimes." In a simple, fixed-function environment, you'll have an easier time tracking down those "sometimes" kinds of problems. In the complex world of Windows, in which loading Word (written by the same company that wrote the operating system) can crash your computer, problems that happen "sometimes" can destroy a bottom line.
However, as Windows continues to gain stability, this argument fades. Many assistive-technology vendors have decided that the flexibility of Windows outweighs the burden of having to develop robust proprietary hardware and software. Now, vendors can focus on creating applications on existing hardware and software platforms rather than dividing their efforts between developing and maintaining such platforms. Additionally, using commercially available hardware and software significantly reduces overall cost. Just compare the price of any fixed-function, closed-platform device to one developed on an open system. It is typically four to 10 times greater for fixed-function assistive-technology devices.
Another argument states that proprietary data formats result in smaller files than, say, verbose HTML files. Given the low and dropping cost of memory, this issue isn't that important, except for devices connected to a network via a low-speed link. If memory or bandwidth is an issue, a plug-in could decompress compressed pages on the fly.
Control is the foundation of another argument. Having control of the interface allows you to keep it sparse and reduce crashes. Tight software, without unnecessary frills, is less likely to let you down. General-purpose software, such as browsers, on the other hand, can act unpredictably. For example, a browser could resize on you, changing the shape and layout of symbols and other elements. Thus, carefully designed symbol grids can be cut off on the sides or bottom. However, vendors could design variations of grids for various screen and font sizes, or users could spend time upfront customizing products based on their needs, such as a huge font for someone with a visual impairment. Although this objection is a consideration, it is not insurmountable.
Another angle on the control argument is that third parties can write flaky software. This is the classic argument of companies developing closed platforms. Admittedly, third parties can write software that brings down the whole system, but they can also help flesh out the software offering for a device. For example, one of the advantages of the Palm Pilot is the tremendous amount of software available from third parties and users. Yes, bad software exists, but the good stuff tends to rise to the surface. In some respects, closed systems leave users hostage to one company. If the company doesn't feel a project is worth pursuing (because it doesn't generate enough profit), the project will never happen. Third parties enable markets that the device manufacturer simply doesn't have the resources to pursue.
One disadvantage of open systems is that they can emulate dedicated or fixed-function devices, but only to a degree. One area in which they come up short is during boot-up. A dedicated device can be up and running as soon as you throw the power switch. PC-based devices have to boot and then often require a user or guardian to select the dedicated application. For example, toddler software on a PC requires an adult to load the software—the dedicated application—because it is beyond the ability of the child. A second deficiency is that programmable devices often aren't as bulletproof as dedicated devices. Again, as an example of toddler software, my son can click the mouse on "exit" or press the Windows task button on the keyboard and break out of the dedicated application. The first is an application issue: The software should require a keystroke sequence as complex as control-alt-delete before letting my child have access to the rest of my computer (and hard drive)! The second is an operating-system issue: Windows never lets an application have complete control. The operating system recognizes the task button before the application has a chance to disable it.
In the end, there's the balance between the "reliability" of software and the availability of software, and cost is a heavy influence on it. Already, dedicated hardware is too expensive to develop. Additionally, browsers and Web editors are now free. The closed-system software market may not last long, either.


The problem with TTS
The reality with TTS (text-to-speech) conversion is that, more often than not, the interface kills it. The simple fact is that business people don't understand how to be read to. For several years, I've had my dyslexic friend Ben test all the TTS software I've come across. Without fail, he discovers in minutes why he cannot use the software. Here's the current list of problems we've come across:
Problem: Selecting the text to synthesize into speech is too complicated. Some programs make Ben highlight text, cut it, paste it into a reader, and then press the read button. If you are scanning e-mails to find an important message, this process simply takes too long.
Solution: To partially solve this problem, the computer could immediately begin reading text when Ben cuts it. (Reading mode is toggled, so he can still use the cut and paste normally.) By reducing the number of steps required for common functions, certain applications become easier and, therefore, more feasible.
Problem: The program assumes too much about the typical user. For example, Ben uses Netscape as his mail reader. Netscape inserts line breaks along the right edge of paragraphs, much like a manual typewriter does. One TTS reader automatically paused at the end of each line break. When we tried to change this configuration, we discovered that the designers had determined that, in most cases, a pause at a line break is appropriate. Instead of making an option, they locked it in. Try listening to a paragraph of text with random pauses, and you'll understand the problem.
Solution: In this case, making a toggle option would solve the problem. In general, leaving usage decisions up to the user by making them options extends the capability of the device to adjust to the user rather than forcing the user to adjust to the device.
Problem: The designers place artificial limitations on the software. Ben is accustomed to people reading to him. His college textbooks were on tapes that he could speed up to four times normal reading speed. TTS software typically uses a slide bar to determine the number of words to read per second. Many of the slide bars are limited to perhaps 100 words per minute, even though a more powerful computer could certainly read faster. (If you think 100 words per minute is fast enough, read the next problem.) Ben is therefore limited to a percentage of the rate at which he could be reading simply because the designers used a slide bar instead of an input box without a maximum limit.
Solution: Leave performance characteristics open to future advances. The computer that the software now runs on might max out at 100 words per minute, but in six months, a new machine will make that limitation unnecessary.
Problem: The TTS program reads but can't scan. When you read, you probably scan for information that interests you. Listening is no different. Ben can listen with full comprehension at 125 words per minute, but he can scan at over 200 words per minute. When he hears a word that interests him, he wants to be able to drop the rate to fewer words per minute (his comprehension speed) and back up 10 words in the text. Current interfaces will let you pause but not back up; you have to go back to the beginning, which means that you have to manually scan the text to find the point of interest and rehighlight the text from that point on. Also, changing speed requires opening an options window. Finally, the new speed change usually requires you to press stop, so you're stuck back at the beginning anyway.
Solution: Add a mode in which pressing the space bar toggles between scan speed and read speed. And how about a "what-was-that?" button that goes back 10 words. Some applications, such as scanning, are so common and so critical to efficient use that they should be standard functions not complex series of steps.
Problem: TTS readers read everything that you highlight, including all sorts of information you'd rather skip, such as funky Web addresses or references. Perhaps even more annoying is how TTS readers try to force correct grammar onto text in which it is incorrect, such in hurried notes you write to yourself.
Solution: TTS readers need to take context into account when reading. Web pages, for example, often have lots of information that you need not read aloud, such as navigation bars, because you aren't yet interested in leaving a particular page until you know what's on it.
None of these problems is difficult to solve, in and of themselves. TTS conversion, as a technology, works well. The interface is the problem. However, each of these problems is enough to render the TTS reader difficult to use under typical conditions.


Typing on a cell phone
Given the restrictive input devices and limited displays accessible by disabled users, assistive-technology companies have learned to maximize the efficiency of I/O. For example, Palm Pilots use color more or less for decoration. Color in a symbol grid can designate that a particular cell is not a symbol but rather a link to a page of additional symbols.
The ability of a user to communicate varies depending upon his or her disability, and assistive-technology vendors have had to learn to be creative and efficient when building user interfaces. For example, some people's means of communicating ideas is limited to the press of a button. Such circumstances require a method for making that button mean any conceivable idea. A user could employ a method such as Morse code, or a more common technique, called scanning. With scanning, the computer creates a grid of options. Users select one by cycling through rows, pressing a button to select a row and then cycling through the columns of that row to select the correct option. The grid can hold ideas as fundamental as letters and numbers as well as words or complete phrases. Depending on the grid you're using (and its predictive accuracy), you can communicate at a rate of several words per minute.
Compare this rate with the one at which you can type using a limited cell-phone keypad. The rise of 2.5 and 3G wireless, with its promises of e-mail and video, sounds more like a Twilight Zone version of hell when you actually want to type in something longer than, "Hi mom!" My cell phone has 16 buttons plus a power button, but almost half the buttons are unused when I want to type. Why is it that a person with only one button can type faster than I can? Because companies have let the telephone interface—the number pad with its legacy distribution of numbers (typing an "s" requires four presses of the "7" key)—dictate the layout of the text-messaging interface.
Some might suggest that voice is the cell phone interface of the future. Voice-recognition engines, however, are often limited to command-and-control applications. Dictating e-mails requires a larger vocabulary and a more powerful engine than can currently fit inside most embedded devices. However, applications that use voice recognition typically use it within the confines of an existing text/GUI interface. Thus, even though you want to, say, change the volume of the phone, you still have to step through several layers of topical menus. You can't just say, "Volume."
Typing isn't the only area in which cell phones stand to learn from the assistive-technology industry. Surfing the Web on a cell phone poses some real challenges. The cell-phone vendors appear to think folks on the Web will completely revamp their pages for mobile users. However, revamping gives cell phones access only to pages made for them, not necessarily the ones people need to access on the road (like a competitor's or customer's Web page). From the assistive-technology perspective, although some sites are tailored to visually impaired users, these people want access to all sites. Some software already filters sites, collects relevant text, and allows listeners to jump from element to element. Such software could enable a cell phone to avoid trying to squeeze lots of text into a tiny screen and take advantage of its greater strength, voice.


The human side of technology
When I began researching this piece, I focused entirely on technology and what it could do for people. It wasn't until one of the disabled people I asked to interview refused to meet with me that I realized that turning technology into useful products involves such untrackable market forces as emotion, fear, uncertainty, stubbornness, and a plethora of other intangibles.
People react to technology in unpredictable and often seemingly irrational ways. For example, a common problem in the assistive-technology market is that many boys refuse carrying cases for their speaking devices—no matter how much sense it makes to have one—because the cases look like purses. As another example, marketing folks often ask why Grandma doesn't use the Internet. Is this issue really about the technology or is it about ease of use? Or are there factors at work that even Grandma may be unaware of?
The world of assistive technology is an industry of specific needs and requests. Someone may need a handle on the side of a device, without which, the device would be useless to this person. Others may need to mount the device to a wheelchair. Still others may need the device to accept commands from a scanning device. This industry has learned to identify and address individual differences, to include flexibility in designs.
The world of the desktop computer, PDAs, and complex electronics is very different. These devices were designed by and for technical people. If you aren't an IT manager, you'd better know where to find one if your hard drive crashes. And even if you're willing to personally help Grandma set up her computer, she may still refuse. Does she feel that she can't learn all those "intuitive" features of computers? Is she afraid? Or perhaps she feels that she's gotten along this long without one, so why start now?
I still don't know why my source refused to meet with me. He stood only to gain, because I might have had him try a device that could immeasurably improve his quality of life. From my perspective, it appears there is technology that can help him. Perhaps he doesn't believe he can be helped. Or maybe he knows something he isn't telling.
Marketing managers will tell you it's the technology—the better sound or the better picture—that drives the sale. Perhaps that's just what they say because they don't really know why we buy what we buy. All I know is that it doesn't matter how good the technology is or how much we need it; if we don't think we want it, we won't buy it.


Why voice dictation still stinks
Three main groups drive the speech-recognition market: professionals, such as doctors and lawyers, whose secretaries take dictation; early adopters desperate to continue their battle to not learn how to type; and users who, for whatever reason, cannot type. However, a person in one of these groups is different from a user who can type a decent rate and has to transcribe his or her own dictation.
Correcting the text that a speech-recognition engine outputs is a complex process of highlighting incorrectly identified words and typing corrections. Consider that, with 95% accuracy, one in 20 words is still incorrect. Spending the time to make corrections will supposedly raise this accuracy over time. However, the process of correcting requires a significant time investment. For professionals with secretaries to verify and correct mistakes, speech recognition actually reduces the time it takes for the secretary to accurately transcribe speech to text. However, for users who don't have a secretary to verify their dictation, it takes more time to dictate and correct text than it would to simply type the material in the first place.
For users who cannot type or use a mouse, voice commands can drive the correction interface, but only to a point. Sometimes the speech-recognition engine cuts a word in half, and you need to include the two halves as you make your correction. This task is difficult if you aren't using a mouse.
Note that there is a significant difference between voice dictation and voice recognition. "Voice dictation" means recognition of a large vocabulary, usually without much context. It relies on the limited predictive ability of the speech-recognition engine to determine what comes next. "Voice recognition," on the other hand, is finding its way into many applications, such as cell phones, in what is called a command-and-control interface. Given a small vocabulary of, say, 50 words, recognition accuracy is much greater and takes less time to process.


Food for thought
For my friend Jonathan, who has little use of his hands and legs and is unable to speak, I selected the Qbe Vivo tablet computer from Aqcess Technology. I wanted to see how we could use a portable computer with a touchscreen to improve his ability to communicate. One area in which I know his family has difficulty is in figuring out what Jonathan wants for lunch. With only the ability to click his tongue, it is challenging for Jonathan to ask for uncommon foods. Even requests for common foods, such as raisins, can take a fairly involved round of 20 questions to determine.
I decided to address this problem by building a demo Web page. One frame supplied the topic list: meats, veggies, fruits, and others. Jonathan can touch any topic, and the second frame displays a page of individual items relating to that topic. For example, the meat page shows steak, chicken, eggs, and cheese. Touching any individual item highlights the item and causes the browser to speak the food selected. Thus, with two touches, Jonathan can identify a food he wants.
A key factor is that building this demo took my friend Tom and I an hour and a half using FrontPage. More than half of this time we spent scanning the Web for interesting images to use. We wanted images for several reasons. Increasing the size of words so that a finger can easily select them results in images that are more long than tall. Also, we could better control the size of images to build a usable page. Finally, a list of words requires reading, but a page of images allows scanning.
Another chunk of time went into figuring out how to get FrontPage to put up two frames, link them, and tie the spoken food name to touching the image. I also recorded all of the spoken words. With a few more hours, I could have easily added names of a third frame that collected the foods Jonathan pressed in a list. Developing the food page has given me a template for building other pages that cover topics, such as feelings, places to go, lists of friends, and so on. With a bit of documentation or instruction, a nonprogrammer could use this template to create custom Web pages.
Although I admit that I am not an expert at building pages, these pages significantly increased Jonathan's ability to communicate. Because any browser can read these pages, I can post them on the Web for other users to download. This ability to exchange pages allows the assistive-technology community to develop tools and support for itself at little or no cost.


Symbolic interfaces
Given the range of challenges that its users face, the assistive-technology industry has been forced to break free of the one-interface-fits-all paradigm. This change results in interfaces shaped to fit both the information being communicated and the individual receiving it. One technology used in many assistive-technology applications is symbol application, in which text words are tied to a picture. A device that speaks for a person is an example of a symbol application. By selecting the proper symbol or word from a grid of symbols, users can quickly craft sentences or phrases. Additionally, a symbol for "feeling," can open a page of more specific symbols, such as anger, love, and hate, giving users fast and consistent access to common words.
Symbols are an example of breaking out of the constrictive box of thinking that all communication has to be in one format. Reading need not be limited to only written words. Traditionally, pictures have been used to complement text to explain difficult ideas. However, you can also use them to explain individual words or ideas. For example, I also teach at the University of California–Berkeley and receive many cryptic e-mails from foreign students. A student could write the e-mail in English, and then the computer could translate it to symbols as a sort of meaning-checker, similar to a spellchecker. If the student uses the wrong word, the device would display the wrong picture. Symbols could also give people struggling to understand words a way to learn them in context.
Currently available assistive-technology software using symbols allows an illiterate or disabled user to "write" e-mails that would be indistinguishable from those written with actual words. The symbols are used to construct a text message; they do not appear as part of the e-mail. The symbol engine could also parse received e-mails, to convert them to symbols. A person corresponding with a symbol writer might never know the difference.
Symbols also provide a means for translation. Translation devices have traditionally taken the difficult path of working with natural language (colloquialism, for example) and trying to map complex meaning to other languages. The problem with such a path is that the translation device needs to understand the context of the message to complete the translation. Working with symbols, however, offers a path for executing translation without full computer comprehension. Users could resolve ambiguities in meaning. For example, many English verbs and nouns have the same spelling, and they may also have several meanings, all of which are different words in another language. Take the word, "sail" for example. There's the sail on a boat, the action of sailing a boat, and the potentially confusing homophone "sale," which has completely different meanings. Users could write sentences in symbols, or, in the case of ambiguities, the computer could display various symbols to help clarify the meaning.


Getting touchy-feely with your computer
As you might expect, some electronic Braille devices dynamically control a full line of Braille text. However, such technology brings only captured text to a blind user. Given that desktops have shifted to spatial-and-graphic interfaces, the true challenge can be in locating text to capture. Several devices propose to enable users to "see" graphics through mice that provide tactile sensation. The choice of the mouse as the feedback device makes sense given the sensitivity of the hands and fingers, as well as the fact that users still need a mouse to navigate between applications.
VirTouch approaches the problem with three finger-sized pads of pins that rise and fall as the mouse scans the screen. (The device costs $4900 with software.) As the cursor passes over a graphics line, for example, the device propagates the line across the pins so the user can feel the mouse passing over the line. Thus, you can use the device to examine flow charts or maps. Two flavors of feedback technology from Immersion have found their way into Logitech mice. The $49.95 Wingman is a mouse that is anchored to a plastic base. When used in shooting games, for example, the base jerks the mouse to create the sensation of kickback from a gun. The $39.95 iFeel Mouse, on the other hand, is a self-contained unit with an Inertial Harmonic Drive engine that can vibrate the mouse to feel like a buzz saw (when you point the cursor over a buzz saw); feel rough or smooth depending on the "surface" the cursor is passing over; or offer a physical "click" that feels like you're running a stick across a corrugated surface when the cursor runs down a pulldown menu. The device enables visually impaired users to determine when the mouse has actually rolled over an icon or to count the items in a list.
One challenge that tactile interfaces face is that there is little if any support for them in applications. The Logitech devices offer feedback in general Windows environments (for example, icons and menus), but the gun-kickback effect works only with games that take advantage of the device's API; they are devices the application must be aware of. The VirTouch mouse works outside applications and thus offers feedback across every application, which might help explain the higher price tag. The lack of applications may slow adoption of the iFeel mouse, but, given that its price is relatively close to that of standard mice (Logitech has already sold more than 250,000 of them), it has a chance of entrenching itself as tactile technology gains acceptance.
Feedback technology comes into play in environments in which users are limited in their use of vision. BMW, for example, is looking to implement radio-control knobs that provide tactile feedback to drivers based on which subsystem the knob currently controls. Thus, you could use the same knob to control a radio or a fan and provide the kind of clicking or resistance drivers are used to those subsystems having. This technology reduces the number of mechanical controls necessary on a console, which in turn reduces the amount of space that the console requires.
Another tactile application worth noting is the $995 Tactile Image Enhancer from Repro-Tronics Inc. It's a printer that cooks raised surfaces onto Flexi-Paper (at less than $1 for an 8.5´11-in. sheet). The printer is more flexible than Braille printers in that it can print raised images, such as maps or large text.



Author Information

 You can reach Technical Editor Nicholas Cravotta at 1-510-558-8906, fax 1-510-558-8914, e-mail ednnick@pacbell.net.





 

ACKNOWLEDGMENT

For their invaluable contributions to this article, special thanks go to Fanita and Benjamin James, Jonathan Hoefs and family, and Tom and Lisa Nau. Thanks also to Compaq for the use of an Armada notebook, IBM for Via Voice speech-recognition software, Philips for cameras and OCR software, ABBYY for OCR software, and Lucent for an Orinoco wireless gateway.

 

 
 
 


RSS
Reprints/License
Print
Email
Talkback
Canon Resource Center

Featured Company


Most Recent Resources

Advertisement
Related Content

No related content found.

  • 0 rated items found.
Advertisement

KNOWLEDGE CENTER

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
Engineering Careers
Jobs sponsored by
Advertisement
About EDN   |   Site Map   |   Contact Us   |   Subscription   |   RSS
© 2012 UBM Electronics. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other UBM Canon sites

UBM Canon | Design News | Test & Measurement World | Packaging Digest | EDN | Qmed | Pharmalive | Appliance Magazine | Plastics Today | Powder Bulk Solids | Canon Trade Shows