You, robot
The high-tech world is littered with failed attempts to make computers that seem
like people. What makes a linguist think she can succeed where the techies haven't?
by Tom Scocca
On a dark, nondescript street, seen from two angles through two
operating-system windows on a jumbo computer monitor, a sort-of-personal moment
is about to take place. Two legless figures resembling shirt mannequins, one
bronze-colored and one blue, are drifting toward one another, steered by Hannes
Vilhjálmsson, a third-year graduate student at the MIT Media Lab. Under
the indigo light fixtures of the workroom, Vilhjálmsson works the arrow
keys, and the figures float closer. Closer still, and then, just before they
meet, the mannequins lift their bulbous cartoony heads, flick one dull sidelong
glance at each other, and look away again. They pass, silently.
It is a disconcertingly human thing to see computer-generated images do. The
figures, created by a program called BodyChat, are rough -- legless, with
Tinkertoy arms and empty space where their necks should be -- but their
movements seem profoundly natural. They are following the bluntest of
directions: each figure is governed by a toggle switch labeled AVAILABLE/
UNAVAILABLE, and both switches are set to UNAVAILABLE. But the instructions are
being carried out with a rare sophistication. If you were to meet one of the
figures on the street and get that glance, you would not misunderstand it; you
would get the message, UNAVAILABLE, as clear as you please. This is what
Professor Justine Cassell and her Gesture and Narrative Language Group have in
mind.
It's a project at which plenty of other people have failed. The
personal-computing era has left, strewn in its wake, a vast array of devices
and programs that were supposed to make machines seem human: bleating speech
synthesizers, annoying on-screen "helpers," near-useless home robots. But
Cassell's work may not meet with the same fate. She is not an engineer or
programmer by trade; her degrees are in comparative literature, linguistics,
and developmental psychology. By applying her knowledge of the
less-than-obvious patterns of human behavior, she hopes to make computers deal
with people on genuinely human terms. And computers -- and people -- seem to be
taking to it.
The computers that most of us use are the products of a very different
philosophy. The windows, the desktop, the little icons representing folders and
documents were born from the idea that computers should blend into the work
environment, not reach out to users. Mark Weiser, the chief technologist at
Xerox's legendarily innovative Palo Alto Research Center, believes computers
should be neither seen nor heard; to him, the computing technology of the
future will be ubiquitous and invisible. The ideal, Weiser says, is for
computers to be so straightforward to use that you won't think about them, any
more than you think about the hammer when you're driving nails.
But some researchers are finding that the hammer is a misleading metaphor.
People may think they prefer the idea of impersonal machines, but MIT
professor Youngme Moon says the interaction between humans and machines is
already a social one.
"Every time it communicates with you, you have a social response," says Moon,
who runs MIT's Social Intelligence Research project, and who has collaborated
with Cassell.
In one experiment, Moon had people perform a series of learning exercises on a
computer, then answer a survey evaluating the computer's performance. She found
that when the computer asked users to critique its work, they would soft-pedal
their responses, the same way people tend to temper face-to-face criticism of
other people. When they were surveyed by a different computer, or with pencil
and paper, their answers were markedly more negative.
Given the way people actually relate to their machines, then, making the
machines more humanlike seems inevitable. Indeed, the idea of building
artificial people dates back thousands of years; MIT theologian Anne Foerst
calls it "a very old dream of humankind." It runs through creation myths, the
Pygmalion story, medieval Jewish myths of golems, and tales like Pinocchio
and Frankenstein. And just as people 15 years ago were captivated by
the idea of a home robot, even if it was just a remote-controlled-car chassis
with a mechanical arm, people exploring the world inside their computers are
inclined to look for man-made beings there.
The uses of such technology could be legion. One application would likely be
the building of "animated interface agents," i.e., walking, talking embodiments
of computer programs. In some jobs, on-screen synthetic people could replace
real people -- as clerks, reference librarians, and the like. Phone companies,
Cassell reports, are for some reason smitten with that idea, hoping to staff
their retail communications stores with technologically impressive fake
salespeople.
The other obvious application is to the online world; many people who interact
online are looking for a way to make avatars, their graphical stand-ins in
cyberspace, seem more human. "Conversations are definitely better if you have
bodies," Cassell says. But currently, even the chat groups that offer graphical
avatars have figures that you simply drag around the screen; some of the
fancier ones may execute series of arbitrary gestures.
As it stands, both animated agents and avatars run to the pointless-seeming or
the creepy. Cassell recalls an unsuccessful desktop agent, or on-screen helper,
with a grin so relentless that "people didn't want to go near their computers."
And one type of avatar, she says, looks at its watch at random moments -- even
if the person it's chatting with is relating something like news of a death in
the family.
Xerox's Weiser sees such failures -- particularly that of the much-touted
"friendly" desktop agent Microsoft Bob -- as evidence that people don't want
human interaction with their computers. But Cassell's view is that the attempts
reflect an "intuition that bodies and faces are important." The problem, she
says, is that designers have had no idea how human gestural communication
really works.
They're not alone in this. The actual way people use gesture is a subject that
remains little understood and much overlooked. When Cassell decided to make a
study of it, at the University of Chicago, she says she found that it was "the
poor stepchild" -- too nonverbal for linguists, too communicative for
psychologists.
Most people didn't even recognize that there was anything much to study. For a
long time, Cassell says, people believed that visual cues simply echoed what
was being said aloud.
In fact, she says, gesture is a communicative channel of its own, one that
interacts with spoken language to convey additional information. By gesturing,
a speaker can describe scenes spatially, reinforce relationships with
listeners, or add physical and metaphoric detail to a message.
English speakers, for one instance, augment our language's underexpressive
verbs with pantomime -- as we say we "went" somewhere, Cassell explains, our
hands make walking or driving gestures. We shape out the positions of objects
we discuss. We explain our point of view.
The key thing is that we don't know we're doing it. Thumbs-ups and
bird-flippings aside, the gestures we make aren't conscious or voluntary. But
they happen nonetheless -- and they convey information. Cassell ran an
experiment in which an actor recounted the plot of a Sylvester-and-Tweety
cartoon short while using hand gestures that added new information to, or in
some cases contradicted, the spoken story. In one instance, the narrator marked
the positions of Tweety and Sylvester with his left and right hands,
respectively, then said Sylvester lunged at Tweety -- but moved the Tweety hand
quickly toward the Sylvester hand. When asked to retell the story, the
observers mingled the information from the gestures and the spoken version so
that, among other things, Tweety took a turn going after Sylvester.
Because people aren't consciously aware of such communication, computer
avatars that let people dictate their movements fail. "[Gestures] are automated
in us," Cassell says. "You don't know you're doing them." Cassell herself
knows, of course; she has trained herself to remember and re-create gestures
and is constantly aware of how she and the people around her are moving, the
way one might pay attention to a foreign language. ("I'm not a native," she
says.)
For the vast majority that takes gesture for granted, deliberate gesturing
comes as a distraction. "Dukakis was a good example," Cassell says. The
erstwhile presidential candidate's handlers told him his hand movements were
too busy and ethnic-looking, she says, so he worked hard on changing them.
Cassell breaks into a brief, uncanny imitation of the ex-governor's stump
manner, chopping at the air with tight little strokes. "What people noticed was
that he wasn't trustworthy," she says. His behavior didn't fit his words.
To put people at ease, then, a computer that uses gesture should move
seamlessly and meaningfully, as little like Michael Dukakis as possible. The
gesture group's project to that end is called Gandalf, a "multimedia
interactive humanoid agent." A sort of a very poor man's Max Headroom, Gandalf
is a screen display of a crudely rendered cartoon head, fat-cheeked and
Viking-helmeted, with a floating hand alongside.
With the user hooked up to a complicated harness and headset to track gaze and
posture (future versions will use cameras instead), Gandalf engages him or her
in a discussion about the solar system, looking from the human to a second
screen showing the planets and back again. In a video of a session, the
conversation is stilted -- Gandalf is designed to converse about the solar
system, not to say particularly interesting things about it -- but
continuous.
Clunky as it is, Gandalf taught the group one of its most important lessons:
emotion doesn't seem to matter. Gandalf's conversational routine can be divided
into two sets of gestures: emotional ones (smiling, frowning, knitting its brow
in puzzlement) and communicative ones (nodding, pointing, and turning to face
different directions). In one test, the lab alternately disabled each set of
behaviors, so that Gandalf was using only emotional gestures, or only
communicative ones. The emotionally deficient version of Gandalf, they
discovered, could carry on a conversation just fine; without the ability to nod
and point, however, its conversation quickly derailed. "The emotional stuff was
not what made this agent intelligent or easy to use," Vilhjálmsson
says.
Many of today's designers, he says, don't get that point. "The emphasis is to
create emotionally rich avatars. But that totally jumps over a whole level,
which is the communicative layer." In everyday life, he explains, we rarely
know the emotional condition of the people we encounter, yet we manage to
interact with them anyway. When the supermarket cashier rings up your
purchases, Vilhjálmsson notes, "you have no idea what emotional state is
being portrayed" -- yet you have no trouble getting your change.
In Cassell's estimation, the purpose of these electronic companions is not to
share their feelings, but to be responsive and accessible. Socially competent
computers matter because they offer the prospect of new kinds of interactions.
Rather than computers disappearing into the woodwork, the way Weiser imagines
things, Cassell sees them taking on new relevance. She is particularly
interested in getting children to use computers for storytelling and
self-guided activity; the next table over from the BodyChat computer holds a
jumble of hardware-riddled stuffed animals employed in such projects.
Cassell herself "was a very building kind of kid," she says -- one who made
dollhouse furniture despite not having any dolls. "I didn't have standard
toys," she recalls. "I wanted technology."
Her office now, which she shares with a lean-faced and phlegmatic dog named
Esme, partly makes up for that, with playthings spread out all over the
shelves. Prominent among them is a cluster of Barbies (and Sindy Dis-moi tout,
a French Barbie knockoff), a kind of toy that Cassell says she never really
looked at till she visited Mattel a few years ago. "I got interested in what
image this sends to girls," she says. "I think it's probably very satisfying to
have a role-consonant toy . . . a toy that fits with what lots of
kids are telling you it means to be a girl."
She presses a button on the back of an "interactive" talking Barbie, and it
pipes up to ask, in a sentence audibly cobbled from randomly picked phrases, if
Barbie and Cassell could "go dancing/with Ken/after the game."
With that, the appeal of socially intelligent machines comes into clearer
focus. "Half the population has been less included in the technological world,"
she says. Video games, which are most kids' point of entry into the computer
world, are generally "boy-friendly," emphasizing the gender distinction. For
Cassell, part of the appeal of storytelling software or toys that can interact
intelligently is that they draw girls into technology, while also encouraging
boys to focus on exploring social relationships.
This is not, however, an idea that most computer people have time for. Until a
generation weaned on such technology grows up, the business will remain in the
hands of the existing boys, who've traditionally conceived of the pinnacle of
computing power as what Anne Foerst refers to as "a disembodied male."
Advancement is identified with building shitkicker processors, refining and
redoubling the computer's ability to match one particular set of human
capabilities. These are the people whose inventions beat Gary Kasparov in
chess.
With that comes the self-confidence peculiar to engineers and hard
scientists, the suspicion that other fields of inquiry don't have much to
offer. "With any technical person," Cassell says, "if you have a background in
the social sciences, you have to show you have something that you need to
know."
She finds there's further resistance yet to hearing things from someone who
is, like Cassell, small and female. At conferences, she says, "everyone assumes
I'm 12. Being female and young-looking means that when a conversation starts in
a professional situation, I have to assert my status."
But the traditional indifference to Cassell's field of interest also creates
opportunity. Though nobody has done much to bring social intelligence into
computing, the computational power to do it is sitting around waiting to be
used.
The limitations of the group's work are not, by and large, technological ones.
Once a set of rules for the avatars' conversational behavior had been worked
out, Vilhjálmsson says, "it was trivial to create the system."
And those glance-trading mannequins on their deserted street have become
something of a hot ticket. The market is itching to see BodyChat turned into
product. Vilhjálmsson has been invited to international computer
conferences to discuss his research, and is avidly courted by industry.
Clearly, the work is filling a void.
"The technology's already here to create the things we should be doing," Moon
says. "What you really need in this field is people who understand people."
"I humanize the interface to allow people to reflect on their own humanity,"
Cassell says. "Some people are in this to make computers more like humans. I
want to enable humans to remain human."
Tom Scocca can be reached at tscocca[a]phx.com