Heather Kenyon goes behind the scenes of Medialab Studio LA to meet the people who create real-time, motion-captured characters by using a technique the studio calls "computer puppetry".
Download a Quicktime movie of Medialab's character "Broz" demonstrating the use of wireless acrobatics and shadows in real-time. © Medialab. Download a Quicktime movie of Medialab's character "Broz" demonstrating the use of exaggerated facial expressions and multiple camera angles. © Medialab.
Motion-capture is such a new form of animation that even the name for this animation technique is still in flux. Medialab has constantly been on the cutting edge of developing performance animation technology, announcing major advances to its proprietary software at regular intervals. Since the company was founded in 1989 in Paris, they have created some 30 characters including Nickelodeon U.K.'s Bert the Fish and Elvis, Pepe the Cricket from Steve Barron's The Adventures of Pinocchio and Cleo, who appears on Canal+'s Cyberflash, a show about cyber-culture. Medialab specializes in computer puppetry, which is a subset of the motion-capture by computer field. Computer puppetry differs from motion-capture in that the results of human body motion are fully rendered in real-time, as the motion is performed. Therefore, animation directors and performers can see the performance instantaneously and can then apply immediate corrections if needed. Medialab creates computer puppetry by combining this real-time capability with sophisticated devices to track not only human body motion but also facial expressions and lip synchronization.
One may have seen the workings of motion-capture before: a computer generated character is moved by an actor in a suit, who is connected to a renderer which in turn moves the CG character. However, we are going to take you through the process of creating a believable, computer generated character by going behind the scenes at Medialab Studio LA in Los Angeles, California. We are going to meet Drew Massey, a performer, Marcus Clarke, who has specifically trained people to work in the motion-capture industry and one of Medialab's foremost technical developers, Herve Tardif.
The Actor's Role
Naturally a major factor of performance animation is the performer. However, I found the typical background of such a performer to be a surprise. One such example is puppeteer and performance animator, Drew Massey. Massey recently performance animated "Broz" for VDTV at the Western Cable Show and interacted live with audiences.
Heather Kenyon: Your bio says you're both a puppeteer and a performance animator. Can you explain the differences between these two different professions?
Drew Massey: Actually, with the technology at Medialab there's not a lot of difference. It's pretty much performing with real-time computer generated puppets.
HK: 'Performance animator': does that term apply to both the puppeteer working on the body and the face of a character?
DM: It's all the same thing. I do a lot of traditional hand puppetry as well, muppet style. With that you're controlling the head and the body of a character. No matter which way you split it up, hopefully, you are blending the performances of both people and making one believable character.
HK: What is your background?
DM: Standard puppetry. I've been involved in several movies, like The Flintstones and Men in Black. Those movies involved puppeteering with either cable-operated characters and sometimes some traditional hand puppetry, as well as a lot of electronic and servo animated characters.
HK: So you come from a puppetry background rather than an acting background? I am surprised you aren't an actor.
DM: As soon as I started getting into puppeteering, I started taking acting classes. I took acting in college and I'm an illustrator too. The process of making art move has always been very attractive to me. It's all about the character, so if you don't have any sort of acting background, it's a lot more difficult to make a believable character. The fact that I'm making all of the movements and voice choices doesn't get in the way. I started out as a puppeteer but I've become a much better actor because of it.
HK: Why did you get involved in motion-capture?
DM: Because its cool! I really like computer animation. There's almost nothing more satisfying than seeing computer animation respond to your every move. It's just a blast. When it works well, it works really well. Motion-capture is a great outlet for a traditional puppeteer.
HK: How much do you work for Medialab?
DM: I go in at least once a week, sometimes twice. It depends. Mostly, I experiment with the system and figure out what I can do with it. It turns out to be quite a lot. Really I just get my own skills down to the point where creating a believable character is almost effortless.
HK: Is the demand for your services growing?
DM: I think it is growing. I know a lot of studios who are just doing motion-capture. Medialab is the only company, however, that's really concerned about getting whole characters together and hiring puppeteers and actors to do it. That was the thing that attracted me to Medialab in particular. They are so performance-oriented. It seems to me a lot of other companies are hiring mainly mimes or people who are specifically dancers and capturing their motion for a particular thing. Medialab is really concerned with bringing the whole thing together.
HK: When you're acting and talking to something that isn't there. What are your biggest challenges to make that look real?
DM: That's interesting. That's something you have to get used to as a puppeteer, the different portions of the body. When I see people that are really into dance, and really concerned with their body, it takes them a longer time to get used to it because they are not familiar with being outside their own bodies. Typically their bodies are the final medium. Every time I'm on a job I'm looking at [the] monitors, the camera's view of the puppet, and playing to that, so it's not that strange for me.
HK: How do you approach playing different characters?
DM: Like any acting job. I like it because the characters are so physically different. It's easier to get into their specific behaviors. It's easier to act different when you look so darn different.
Developing Performance Skills
Taking puppetry into the 21st century demands a completely new set of skills. Massey is one of the puppeteers who successfully auditioned to be trained by Marcus Clarke and Helena Smee when Medialab Studio LA opened. In addition to working with Jim Henson's studio in the U.K., Clarke has worked with Medialab in Paris and Japan and was sent to the new Los Angeles-based studio in January, 1997 to develop and train a pool of performance animators. A select group was chosen to hone a special skill set that would enable them to adapt their art and bring virtual characters alive using Medialab's technology and devices.
Heather Kenyon: What kind of background do most of your trainees have overall?
Marcus Clarke: In the past we've experimented with a general group of performers. Performance animation is basically divided up into the person in the suit and the person who does the facial animation. The first thing you have to do is introduce what performance animation is to the group. Basically I'm looking for performance skills, people who can express themselves. You're looking for people who understand the primacy of the screen image. By that I mean, you quite often get performers who are feeling a lot of what they do internally, but because of the technique that we use, we need actors to realize their emotions on the screen. What we've found in the past, is that mime actors, from the nature of their work, relate everything to themselves. Likewise with dancers, who would seem to be the natural people to put in the costumes, they're doing wonderful movements but often because the character is a different dimension to themselves, we have collision problems. That's the number one thing, whereby the characters arm is going through its leg or body. What we need are performers who can look at the monitors and straight away get the dimensions, proportions, of those characters into their mind, then look at the screen. Then its almost an animation process of finding what kind of rhythms work, in terms of walks, which kind of weight shifts work. Sometimes you'll find yourself doing something really strange, unnatural, but on screen, the character looks like he's walking brilliantly.
Likewise for the face, many successful characters are cartoonish. What you try to do is not put a realistic person on the screen, you try to put a performance on the screen. You have to really just be looking at what kind of facial gestures, working hand in hand with the body, transfer.
What the connection is between doing a hand puppet and going into a body costume is odd at first but when you think it through, it involves control, and thinking about how the characters should move. Most puppeteers can get a cardboard box to walk around, an inanimate object. Both require animation skills and dexterity. It's all happening on a subconscious level. Character creation's a bit of mystery, I think, to everybody. You just know when it works. Maybe this relates to things like, when the synthesizer came along, and the computer in terms of graphics. People thought, 'Drawing skills will be dead,' or, 'Being able to play the guitar will be dead.' Yet a lot of the people who succeeded with electronic music have a good understanding of music.
HK: What do you think are the dominant skills a performance animator has?
MC: Character creation, number one. The next thing is dedication. To become a good hand puppeteer is very hard work. If you didn't really, really want to do it, if you weren't obsessed with it, you wouldn't do it, because at the end of the day you're a puppeteer, not a qualified taxi driver. Something important!
HK: What is the most unique aspect of performance animation? That applies only to performance animation?
MC: You can perform a cartoon character in real-time. That's not the same as doing a key frame character in CG, a hand-drawn animation character or a puppet character. Certainly that's what attracts me.
HK: It looks like you just put on this suit, you move, and the character moves just like you...
MC: That's the worst thing you can do. Mimicking reality doesn't work. Often producers think, 'Well this is great. We can just stick an actress in the costume, put a face reader on." You can do all that, but you'll find it's not believable. If that was the case, I'm sure actors like Robert DeNiro wouldn't be paid what they are. That's like if someone comes across a puppet like Miss Piggy and puts their hand inside, then wonders, "Why isn't it happening?"
HK: Do you see the demand [for motion-capture] growing?
MC: I hope so. I love working on the Medialab system. Sometimes when a new technology comes in it doesn't have a direct application because there are certain conventions already set up. Animators have said motion-capture isn't very useful. You have people who've had bad experiences with the early development of the technology. It takes awhile for people to say, 'This is a useful tool. This is better.' When you have a new tool, there's often a little lag before it comes into common acceptance. I think that's what's happening now.
The Technical Process
Herve Tardif is one of the code writers based in Paris whose knowledge of Medialab's proprietary Clovis system (the engine that drives the real-time factor) is practically unparalleled. Clovis was first developed in 1989. Tardif is now going to take us through the technical side of making a character move and, more importantly, act believably.
Herve Tardif: One notion that is very important is the idea of skeletons.We are going to have one real person wear a number of sensors. These sensors measure the position and the orientation of the segment on which they are attached. We are working with electro-magnetic technology, which consists of one source emitting a field and a receiver measuring that field. After some processing, it gives information on the position and orientation. With this information we are able to build a skeleton that is going to be exactly or very close to the skeleton of the real person. We will have a copy of the real person. That copy amounts to building a skeleton of the real actor, and attaching the different values received by the sensors to the proper segment of the skeleton we just built. You can imagine conceptually at that stage, we have a skeleton that moves exactly the same way the person moves. That kind of information is already useful for our work. For people who are interested in the motion acquisition business, that is pretty much what they expect: a skeleton along with the orientation of each of the segments of the skeleton.
Another application, which is our most common application, is indeed to drive virtual characters. At that stage we have a skeleton, which is a copy of the real person, and another skeleton of the built, or virtual character. These skeletons may be very different because we may want to animate a gorilla, or a very thin woman, or a very big and fat character. There are many chances that the skeleton of the virtual character will differ greatly from the real actor. At that stage, what we do is a mapping of one skeleton to the other. This gets very tricky and it's where we have a lot of proprietary information. This is a major issue for all people involved in motion acquisition [motion-capture]. When the proportions are quite the same, it's just a straight adaptation. It's easy. But when the proportions are different, it can get pretty tricky.
Usually there are a couple of things we need to insure. These things are usually the location of the feet on the ground. We do mathematics to insure that our virtual character will always have his feet on the ground. Starting from there we can go up the hierarchy of the skeleton, and take the values from the real actor and place them on our virtual skeleton. Once we've done that, when the real character moves, the virtual character moves. The more different the virtual character is from the real actor, the more different the motion is going to be. Suppose the script says that the character should scratch his head. With the two skeletons being different, it is very unlikely that when the real actor scratches his [own] head that the virtual character will indeed scratch his head. But if we show the result of the virtual character, the puppeteer will adjust to that and if he or she is asked that the virtual character scratch his head again, he or she may expand his or her motion further away, maybe go behind the head or before the head but on the screen the results are going to look like what we are expecting. They rely heavily on visual feedback of the virtual character in order to do motion that will be the required action.
Heather Kenyon: Can you talk a little bit about how the facial movements are done and the lip synchronization?
HT: Again, with respect to these things, we rely heavily on the puppeteers. We use a multiple object interpolation technique. For example, we have our computer graphic artists design extreme positions for the mouth. Let's take a sample case: We have a face and we want to open and close the mouth. We may have an expression of sadness and one of happiness. What we're going to do is to have several extreme positions. The puppeteers control the characters through a set of variables. One could be the opening of the mouth and another the mood. Now, if you put these two expressions, or variables, on a glove, they will be able to play independently as the character opens and closes it's mouth. Then it's pretty much up to the puppeteer to give the virtual character some lively expression by moving his fingers. The example that I have just described to you is a very simple one. But you can imagine that you can have more degrees of expression with more variables. Common variables are the opening of the mouth and the mood. It can also be mouth expressions. We are very careful in the design process to model the expressions so that they are workable.
As far as the lip synchronization is concerned we have two major techniques. One manual technique where the puppeteer is doing the lip synchronization. In this technique, the puppeteer listens to the voice material [recording] and does the lip synch by hearing the soundtrack and manipulating his fingers so that he gives the impression that the virtual character is talking. This is very difficult and you need very talented puppeteers.
The second technique that we derived only came recently because we realized that we couldn't always rely on talented puppeteers. For some productions, they may not be available. We decided it would be a good idea to have some method of automatic lip synch. We started this very efficiently with the help of a small company called Ganymedia in France. These people have a lot of experience in voice recognition. The way that the technique works is that the voice talent produces the soundtrack that we want the virtual character to say. The voice talent is filmed with a camera just in front of him, with his lips painted in blue. With this, and the help of some recognition techniques, we are able to derive the opening of the mouth in real time and the roundness of the mouth. These two parameters are then fed into our system, instead of having a puppeteer doing the opening of the mouth and the roundness of the mouth. Since it's automatic it tends to work better for not-so-talented or not-so-experienced puppeteers. The system will allow us to detect even more mouth forms than roundness and opening. We are currently talking with these people in France in order to improve an automatic lip-synch system so that it can detect more mouth positions. It is our belief that this technique will work better for realistic characters. We do not plan on using this technique for non-realistic characters. We will stick to the first technique that is done totally by puppeteers.
HK: What kind of hardware are you using to render these images in real-time?
HT: We are using, expansively, SGI. We have a decent product because SGI goes from not-so-cheap to very expensive. We have characters that can work on the Octane workstation which costs roughly 400,000 French francs [US $80,000]. We have some character working on Octane which is mid-range, and we also have some very well refined and detailed characters that run on Onyx workstations. This is the top of the line.
HK: How are these electro-magnetic signals fed from the person to these machines?
HT: This is another part of the equipment. Basically, the system provides you with position and orientation of a certain number of sensors. We feed this information into the SGI machine. We have our software on the SGI machine that reads these values and adapts them.
HK: So this is predominately proprietary software? HT: Yes. Except for hardware, the majority of our software is proprietary. HK: What functions does this proprietary software allow you to do that is unique to Medialab? HT: It's more of a whole package, an overall level. I have seen companies that do a very good job in terms of rendering characters or motion acquisition, but I haven't seen companies that can really animate a character the way we do in real-time. Our software has been designed to do real-time animation and to be used to produce TV animated series. We are now capable of doing real-time characters with shadows. We are also able to use our system on a real live set. When it comes to compositing computer graphics with real live shooting, there is one notion that is very important and that is coherency between the real world and the virtual. For instance, when you have a real character talking to a virtual one, you need to pay a lot of attention to your cameras. We came up with a way of calibrating the virtual camera with the real camera. It's a very simple process that allows us to integrate, in a very believable manner, the virtual character with a real environment. This process works with fixed cameras that do not move. One very big improvement would be to allow for camera motion. This gets into the field of virtual studios. Recently, we have been working with a company called Symahvision. They offer a system that can track a real camera, shooting a real live scene, and then provide us with camera positions to match our virtual camera position. With this system we should be able to integrate virtual characters with a live set. This is going to increase the credibility of the compositing. It's one thing to see a virtual character talking with a real person, but having these two characters filmed with a moving camera is really something else. It adds a lot. It is a very large technical difficulty. We are going to use that system in production very soon. We are in an extensive test period. We are trying to use the system on a show which is being produced for a TV channel in France. HK: When do you think we can expect to see this? HT: March. Recently, we switched to a wireless system. We used to use a wired system where the actor was linked to an electronic unit, with 16 cables which really restricts the motion. That was one of the major drawbacks. Now we are working with a company called Ascension Technologies. We've been using that wireless system for over a year now and it's giving some pretty good results. We can now have an actor walking on a stage without him or her being linked to any wires. Before we couldn't roll on the ground or turn around many times. Now we can do all of this very well. We even have someone doing gymnastics like backflips right now at the studio. HK: Where is the future of this technology? HT: Our goal is to come up with a system that TV channels could use or even direct live television. We are working on the camera issue because we know they will want that. We will also probably see several characters. Right now we have one character when we shoot. When we record, we do one character at a time. In the future we will have multiple characters interacting. There is a huge number of difficulties to get to that, combining the two worlds is difficult and we need to be very precise. HK: How far away do you think that is? HT: We already did that on some shows. The level of interaction is low, because it is difficult, but I believe that very soon, probably this year, you shall see some virtual characters interacting. We've been in the field for six years now and it's getting to a point where people at some TV channels in the U.S. are ready to go for it. It's already being used in Europe by Canal +, TF1, FR2 and Nickelodeon UK. The Quicktime movies used in this article were created at Medialab Studio LA with Paul Pistore as the face and voice, Lydee Walsh as the body, Michael Hand as technical director and Robin Howard as talent coordinator. Heather Kenyon is Editor-in-Chief of Animation World Magazine.