Puppetology: Science or Cult?

by Brad deGraf and Emre Yilmaz

From the Ryder Encyclopedia of Music (Paramus, NJ, 1951):

"In 1809, maverick opera producer Philippe Brodatz attempted to put on an all-brass production of Monteverdi's elegant and delicate masterpiece, The Incoronation of Poppea. The lead tenor and soprano were played by a tuba and a trumpet, and the chorus consisted of thirty-seven slide trombones. This ill-conceived project was an unmitigated disaster. The loud, bombastic brass instruments were completely unable to interpret the delicate score. The tenor himself fainted from the din. As for Brodatz, he was pelted with refuse and left the opera house under armed guard. As this example proves, brass instruments should never be taken seriously in classical music."

Moral: The saxophone isn't a bad instrument. Just don't use it to play classical harpsichord scores. It plays jazz great.
All photos are courtesy of and © Protozoa.
Performance animation is a new kind of jazz. Also known as digital puppetry or motion capture, it brings characters to life, i.e. `animates' them, through real-time control of three-dimensional computer renderings, enabled by fast graphics computers, live motion sampling and smart software. It combines the qualities of puppetry, live action, stop motion animation, game intelligence and other forms into an entirely new medium. Being new, the medium is just beginning to be explored, and has created a lot of controversy, driven largely by the perception that it is cheating, the `Devil's rotoscope,' and is thus somehow not true `animation.'

The `Is it animation?' debate is just semantics (Is puppetry animation? South Park? Mr. Bill?). At Protozoa, we have a wide range of approaches, informed by more than a decade of experimentation and production, and we're more interested in how the medium can be played with to create characters and tell stories in new ways. We'll try here to describe what we see as some of performance animation's unique qualities and how we exploit them, as well as to dispel some of the misconceptions surrounding the technique. At a high level, the qualities of the medium that make it interesting to us derive from: the spontaneity of live performance; the manipulability, weightlessness and fluidity of digital characters; and the autonomy/intelligence that a computer can add to a character's nature.

We dislike the term `motion capture,' because it reinforces a shallow understanding, and trivializes the form. (No one would call HipHop music `sound capture,' even though a large part of it is captured, i.e. sampled, sound.) `Performance animation' is more broadly descriptive and inclusive. It implies a more active creative process and credits the deep heritage underlying the medium.

We realize that motion capture is a good buzzword though, and that it has taken hold as the term of choice for now. It IS descriptive and appropriate for a PART of the process, and for specific applications that don't pretend to go beyond the recording of motion data.

Long before the advent of fast graphics, high-end puppeteers and special effects wizards were using analog and digital `motion capture' devices such as joysticks, waldos, facetrackers and other contraptions, to puppet remotely mechanical creatures in real-time. It was called remote-control animatronics. Instead of putting a hand in a sock, they put a plunger driven by a motor in a sock, and sent instructions to the motor by radio from a dial across the room.

In 1986, Jim Henson, Michael Frith and others at Jim Henson Productions, became aware of what real-time computer graphics were capable of, and conceived of a natural extension of remote-control animatronics: replacing the mechanical creature with a digital, rendered one. Now instead of a plunger in a sock, it was a rendering of a sock, animated with software that used the dial for input.

At the time, the only computers capable of fully-shaded real-time rendering were $1M-plus flight simulators, so initial experiments with the concept at Digital Productions were done in wireframe only. A couple of years later, the first workstations capable of real-time shading came out, and the medium really began, first with a live performance at SIGGRAPH `88 of Mike the Talking Head by deGraf/Wahrman, and soon after with Waldo C. Graphic for the Jim Henson Hour by Henson and PDI.

An important point here is that none of the early applications used `motion capture' in the sense that most people have of optical or electromagnetic body capture. They were pure and simple extensions of the puppeteering craft.

Fast computers not only provide a way of rendering characters live, they allow characters for the first time to have the semblance of a brain. With a computer in the mix, new kinds and complexities of creatures (hence, the name Proto = original, Zoa = animals) are possible, since human creators aren't required to control every aspect of what a character does. While true intelligence is still a long way off, we are at least at the point of automatic blinking, breathing, hand gestures, locomotion, reflex, intention, etc., and we expect this to be a rich vein of exploration.

Another quality enabled by the computer is digital manipulation and encapsulation. In particular, the ability to represent characters and animation digitally allows layering and editing of motion in a multi-pass manner, comparable to MIDI for music. MIDI allowed musicians to perform one or more instruments at a time, replace a melody played on one instrument with the voice of another, play them back while performing others, etc. until a complete piece is achieved, all the while maintaining the entire piece in pure digital, script form. Performance animation does the same for motion `instruments.'

Mocap - One Component of Performance Animation
Performance Animation requires three things to happen in a 30th of a second (the time elapsed in a single video frame): motion must be sampled from whatever sources are being used, that motion must be applied to a digital 3D scene representing the various body parts of a character and that scene must be rendered into a digital image. `Motion capture' is the first step, the sampling and recording of data. The other two steps are where the art is, the creative use of that data in a larger context, along with other means of expression.

Optical tracking systems, electromagnetic body suits, facetrackers, joysticks, a mouse, even a computer program, are all devices that can be sampled, treated as data sources and recorded. Capture provides input for performance animation; it is not the process itself. That is a crucial distinction that is often lost.

Optical vs. Magnetic Capture
Many people, when they hear the term `motion capture,' think of a performer with a bunch of bright dots all over their body, i.e. optical capture. This is unfortunate, because most applications of optical systems are `capture only,' i.e. no one yet uses them to drive characters live (though that is becoming possible through recent developments). With optical capture the data captured needs to be post-processed in non-real-time before it can be applied to a scene.

Without seeing the character live, it is impractical to do anything creative with the motion, since the performer, director, etc. can't see the effect of the motion on the character. A simple example is a character with a giant nose. If the character isn't rendered live, the performer will have no idea if or when the nose is getting in the way of the hands.

This is the source of a lot of the `Devil's rotoscope' controversy that contends that motion capture is simply a slavish recording of data, used to produce a facsimile of animation faster and cheaper than `true' methods. Pure capture is perfect for directly mimicking human motion, such as sports applications, or making a skeleton dance like Michael Jackson. In such cases, the last thing the performer would want to do is watch themselves on a monitor as they do it.

Unimaginatively used, pure capture IS slavish, and is a poor replacement for great animation. But as a crucial part of performance animation, it is a starting point, the necessary method by which direct human control gets into the process.

For that reason, we prefer doing body capture electromagnetically, since we can get full position and orientation information in real-time, and apply it live to characters. As part of this, we use a visor or monitors, on which the character is displayed, just as puppeteers watch their puppets from the audience point of view as they're performing.

If you have the QuickTime plug-in, you can view a video clip of a worm animated using motion capture. (195 k.)
Choose Your Weapon: The Creative Uses of Performance Animation
Our goal usually isn't to transcribe precisely the performer's motion; it is to give the performer the best and easiest possible controls to enliven the character. In designing and building live characters, there are many options from which to choose. The choices one makes go a long way toward making or breaking the results. If a project is going to use performance animation, it's best to conceive it that way from the start, and play to the medium's strengths.

What input devices one needs, what kind of performer one needs, the performer-to-character mapping and direction, all crucially affect the results. This is where a lot of our expertise and experience lies. While this is rather different from the set of skills required to be a good keyframe animator, it is similar in that it also requires sensitivity to the character, its personality, and what kinds of movements will look good on the design. For instance, trying to get the performance to read in silhouette has been a puppetry training method even longer than it's been a rule of thumb for animation.

The Man in the Monkey Suit: Applying Human Data to Non-Human Characters
A crucial step in going beyond motion capture is re-proportioning data to fit non-human shaped characters. Many of the characters we've built have had to do this, as ludicrous proportions are a hallmark of `cartoony' character designs. Making human-shaped data work on one of these characters, without introducing ugly artifacts like skating feet, is a challenge and an art.

Ironically, it's often better to have less data than more -- we usually use only 12 body sensors. If you had one sensor for every single moving part of the body, you'd have a lot more information tying you to the human form, but for our purposes we just want enough sensors to convey the broad lines and arcs of the body.

Capturing Things Besides Humans
Less obvious than this, we've experimented with capturing things besides humans, and with capturing humans in peculiar ways. For instance, we've used a stop motion armature with sensors on it to create movements impractical to do with a real performer. We've used the old Vaudeville horse method to perform four-legged creatures. Plus, we've performed worm-shaped characters by puppeteering four sensors with our hands and feet.

One good example of this creative selection and collection of data is a foam rubber tube we captured using an optical face-tracking system. The physical, dynamic properties of the moving foam rubber can be captured just as well as the physical, dynamic properties of a moving person.

The movements of the end of a foam rubber tube is captured using an optical face-tracking system. View the results of this experiment with the QuickTime plug-in. (228 k.)

Puppets With Brains
In addition to collecting, using, and applying motion capture data creatively, the computer enables captured motion to be supplemented with procedural animation, dynamic simulations, intentional logic, and other forms of imparting low-level, and eventually high-level, intelligence to characters.

Procedural Animation
It's becoming fairly common among animators to use expressions and procedural animation to take care of a lot of the work. This is a particularly rich area of exploration. For instance, in animating a dinosaur, it's possible to write expressions that open its claw-foot as it's about to land on a surface, and to close the claw-foot again as it's raised. Then the animator only has to take care of foot positions; the toes are computed automatically. Similarly, writing expressions to control a number of low-level features from one high-level attribute makes for much richer characters than could otherwise be practically controlled.

The extremes to which this can be taken, and the particular utility of these methods in performance animation, are not so well known. When you're trying to perform everything live, the more motion you can derive, or get `for free,' the better.

We have a few characters which are completely controllable just by using the mouse. Dalph, a prototype game character created by Bay Raitt, is a good example. All the mouse really controls is the speed and direction of the character, and everything else is derived from these. For instance, as he speeds up, he assumes a more menacing, aggressive posture, hunkering over, opening his claws, twitching his fingers, and blurring his spokes. Slowing down, his hovercraft-like bobbing and floating becomes gentler. When steering, his head turns first, followed by the rest of his body. All this automatic animation makes him more compelling to watch and to play with -- and very easy to `perform' well.

Dynamic Simulation
Real puppets often incorporate a lot of `secondary motion' into the design. Long fur or hair that drags behind a motion, or arms that dangle and swing, can add life to a puppet. Even a puppet that's only going to have one hand controlling it, and thus not a lot of direct control, can get a lot of `free' motion from physics this way. We can use similar tricks with our digital puppets.

With the QuickTime plug-in, you can see Max Rodentae from the Virtual Ed Sullivan Show. (845 k)

Max Rodentae is a peppy little rodent character we built for the Virtual Ed Sullivan Show on UPN (1998). His performance is supplemented with a number of dynamic simulations -- that is, simulating the physics of masses, springs, gravity and other forces. His tail, ears and belly are all controlled this way. It gives him a lot of nice secondary motion, and adds a sense of weight and believability. The result is a floppy, fun character whose whole body is expressive, and who doesn't betray the human inside.

Cross-media Migration
Because of the digital nature of 3D animation, it travels well from television to the Web to live appearances, and to the myriad of hybrid media we can't even imagine right now. This is akin to PostScript, which allows page descriptions to be independent of the device (printer, monitor) that displays them. By having multiple resolutions of a character, it can be rendered live to tape or air for broadcast applications; output as Renderman or other rendering standard for film quality; or delivered as geometry, motion, and audio to the Web for client-side animation.

The Web is a particularly rich ecosystem within which digital characters can thrive. The simple notion of distributing characters as body parts and motion instructions, rather than as video, is an ideal exploitation of the millions of internetworked computers around the world. Why send thousands of images, when you can send the raw materials, and let the client computer make the images, effectively multiplying the bandwidth?

Performance animation applies extremely well to such an environment, being one of the few techniques capable of providing significant amounts of animation quickly without huge bandwidth demands. The Web is aching to be `animated,' and live 3D characters are a great way to do that.

Our recent work on Virtual Bill for MTV is an ideal example of a character that lives both on broadcast and the Web. He first appeared as a virtual VJ on his own show, and eventually ended up as interactive, streaming 3D animation on MTV Online. Check out our web site to see a sample.

So, while performance animation is like a new kind of music that is still being experimented with, those who are using motion capture's innate qualities intelligently are making great tunes. As other facets of the animation community and audiences see that work, hopefully the technique will achieve its deserved status as a rich medium for storytelling. Performance animation won't replace other forms of animation; it is going to expand animation's place within entertainment, taking it to new places and creating new possibilities. Puppetology: it's not just a science, it's a cult.

Brad deGraf is the Chief Executive Organizm of Protozoa. After starting his career designing war game simulators for the military, he switched to computer animation because its end-users don't need secret security clearances.

Emre Yilmaz is a puppeteer/animator/director, and 4-year Protozoan. His work areas include puppeteering, applying motion capture to non-human characters, character setup, design, internet animation, and procedural programming. His projects include Max Rodentae,
Flat, Floops, and others; awards include "Best Performance Animation" from the 1998 World Animation Celebration.

Note: Readers may contact any Animation World Magazine contributor by sending an e-mail to