Human arms serve (at least) two separate functions: they permit an agent/avatar to change the local environment through dextrous activities by reaching for and grasping (getting control over) objects [17,14], and they serve social interaction functions by augmenting the speech channel with communicative emblems, gestures and beats [9].
For the first function, a consequence of human dexterity and experience is that we are rarely told how to approach and grasp an object. Rather than have our virtual humans learn -- through direct experience and errors -- how to grasp an object, we provide assistance through an object-specific relational table (OSR). Developed from ideas about object-specific reasoning [28], the OSR has fields for each graspable site (in the Jack sense of an oriented coordinate triple) describing the appropriate handshape, grasp approach direction, and most importantly, its function or purpose. The OSR is manually created for graspable objects and allows an agent to look up an appropriate grasp site given a purpose, use the approach vector as guidance for the inverse kinematics directives that move the arm, and know which handshape is likely to result in reasonable finger placement. The hand itself is closed on the object through local geometry information and collision detection.
The second function of gestures is non-verbal communication. Thus gestures can be metaphors for actual objects, give indicators (via pointing) of location or participants in a virtual space around the speaker, or augment the speech signal with beats for added emphasis [9]. Currently we are working on embedding culture-specific and even individual personality gesture variations. The potential interference between practical and gestural functions is leading to a resource-based priority model to resolve conflicts.
Given that arm control for avatars requires fast position and orientation of the hands for either reaching or gestural function, fast computation of arm joint angles is essential. In recent work we have pushed beyond iterative inverse kinematics [46] to analytic formulas that can easily keep up with a live performance or a motion synthesizer outputting end effector position and orientation streams [44]. By extending this idea to the whole body, multiple individuals (3-10 on an SGI RE2) may be controlled in real-time by arbitrary end-effector and global body data alone [47].