Peter Plantec tackles the importance of virtual acting, discovering the tricks and tools top companies are using to bring life to their digital characters. Includes a QuickTime clip!
If you have the QuickTime plug-in, you can view a clip from the film by simply clicking the image.
Several years ago, I got to spend some quality time alone with Ray Harryhausen. It was only about half an hour, but I count it as a career highlight. Although I dont recall his exact words, Harryhausen told me that he always had a deep commitment to the animated performance, believing that a great one was as engaging and emotionally telling as life itself or at least a great human performance. The thing that convinced me was his work on Mighty Joe Young. I know that hes right its not just movement; its performance.
Following in Harryhausens footsteps is Dr. Mark Sagar, who was responsible for much of the wonderful facial animation in Peter Jacksons King Kong. Sagar too has a commitment to performance. Ive been fairly outspoken about Kongs MoCap performance being just awful. He looks like a 4,000-pound gorilla bouncing around Times Square like a 190-pound man. It just doesnt sell. But the face was a different matter. I said to myself, Kongs face doesnt look MoCapped. MoCap in general still has a look that Id rather not see. Kongs face -- and all the other truly great facial performances so far -- arise mostly from the brilliance of the individual master character animator(s). Each subtle movement of the face imbued with the loving touch of a dedicated and talented artist. I understand that as much as 75% of Kongs face was keyframed, and I think it made the movie. But keyframing is an expensive, time consuming luxury and its all changing.
Perhaps the finest example ever of this hybrid approach with MoCap tracking being used with keyframe animation is found in Pirates of the Caribbean: Dead Mans Chest. I believe ILM has crossed the valley into the land of believable virtual human performance, but more about this later on.
With the unprecedented popularity of animated performances these days, we need a way increase production, believability and magic. When it takes the delicate hand of a great character animator and they are few and far between youre not going to see useful increases in output. It certainly would not be possible to keep up with even current demand and tight schedules. So we end up with what happens all too often these days, inconsistent animation of the same character. Shrek, as successful as it was, went from great (for example, the eye animation on Princess Fiona) to mediocre at best, with inconsistencies within characters. Shrek is not alone: the Star Wars franchise has had its uncanny digital actors as well. Not to pick on them, but nearly all virtual performances in features have been inconsistent because of the number of people working at different levels on them. But they were entertaining and they set the ball in motion. We now need better virtual performances. Imagine going back to the old Felix the Cat animations: you might still find them entertaining, but the animation would likely be annoyingit is to me.
Taking a large portion of the responsibility for life-like performance out of the hands of lower level animators is changing the business big time. In a way, its sad. Many animators are developing a whole new set of skills. It actually takes a different, more technical type of person to be a character animator these days. Some of the great artistry is being lost, and I, for one, think that artistry can never be fully replaced by technology. I call these newbies tech-animators. Most of the ones Ive talked with recently truly have hand animation talent, but theyve been trained to use technology to power through a ton of shots quickly. I believe in the near future, this new breed of tech-animator will evolve into a population with little or no knowledge of traditional animation, digital or otherwise. They will know the technology and how to use it to create life-like performances. Thus we are bound see a decline in animated character performances a la Chuck Jones et al.
With the demand putting pressure on producers, its necessary that we move on. Im thinking there will be a place for master character animators for a long time in tweaking Hero characters to give them that special animated personality look, thats bigger and more engaging than any recorded performance. But Ive already seen a small army of tech-animators on the rise.
Getting back to technology, though, there are three basic approaches that Im looking at, and all involve MoCap. First, the long standing standard MoCap approach and its newest refinements that is making capture more accurate. The second uses fewer capture markers to give the general performance, with the detailed nuances created by underlying virtual face structure. The third is the use of virtual sensory perception and Artificial Intelligence (AI) to automatically create behavior streams on-the-fly using MoCap libraries and behavior blending.
A Time Of Transition
Sony Pictures Imageworks is one of the most respected vfx/animation, and I think they have a great attitude about virtual performance. Theyve been responsible for some of the wonderful virtual performance work done in movies such as Narnia and Monster House and many more. Imageworks seems to believe in both the power of MoCap and magic of talented people. Its one place where youll find a lot of talented animators who are working hand-in-hand with new technology. According to Debbie Denise, evp of production infrastructure/exec producer at Sony Pictures Imageworks, the studios basic philosophy toward animation and character performance strongly impacts their R&D and staffing efforts. It may sound cheesy, but its that philosophy that makes all the difference in what you see on screen.
Imageworks approach and philosophy is that through motion capture and animation tools, we try to preserve the essence of what actors and directors bring to the performance of a character. This applies to body and/or facial capture. To this end, we are always trying to improve our ability to get that performance on screen as faithfully as possible, taking into account the design and feel of the animated character.
I asked if she thought all the equipment and greenscreen claptrap hampered performance at all: We feel that its critical to allow the performer and the director to be unimpeded by the technology we deploy to capture the data. Thats why our R&D group is working hard to develop new and better ways to capture the data with less invasive technology and in creative environments. Naturally, I wasnt able to get many details of the actual R&D efforts or how next years Beowulf improves the process. However, this is what she offered: As you know, there are several ways to capture a performance. Generally we go about getting the characters skin to animate by tracking drive points on the mesh skin surface directly with the markers (on the actors) and another approach is to develop software that analyzes the movement of the markers to trigger muscles or shapes to deform the surface.
Our current solution uses both of these techniques. We work directly with the data in an extremely efficient way, and animators can alter it to reflect any changes in dialogue, sight lines or exaggerated or toned down movement. And we can choose the degree of application of each method in all shots. This seems to us to be the best of both worlds.
We are also working on methodologies that would give us ways to capture the facial performance with as much data as we currently capture, but with few or no markers on the performer... in all types of environments or stages... Not too much to ask!
I knew that Imageworks has long taken character animation seriously on many levels. I asked Denise to comment in general on the artistry vs. technology of their philosophy: No matter how great the technology is, the animators are still the ones who turn the data into magic. To that end, we build animator friendly tools that allow the animators to enhance the data intuitively, rather than through overly complex user-interfaces. She stressed that for Sony, its as much about art as it is about technology.
The Magic of MoCap
Until recently, the MoCap approach has required actors to don shiny little balls or other markers that could be tracked by multiples special video cameras, within a confined space. In some cases, it still does, but as demands for accuracy and comfort are heard, some new trends are emerging. Recent photometric approaches involve tracking facial characteristics in new ways. This sometimes involves the actor having their face done up in strange makeup in lieu of markers. These new approaches claim higher data density, accuracy and comfort for the performance actors.
With MoCap, as you probably know, human actors perform the characters lines and business while its captured in video and translated to a flow of mathematical motion data. The data is filtered and interpreted by 3D animation software on a point-by-point basis using standard plug-ins or proprietary pipeline elements. Essentially the cleaned motion data is used to manipulate the 3D character mesh. As the actors face or body moves, so goes the 3D mesh doppelganger. In the best productions, master character animators tweak the performance before final render and compositing so that it has character that cant yet be captured. The results can be spectacular.
The biggest problem with this approach has been capturing the subtleties of the face, especially the eyes. Its often impractical to get a master face animator to do the magic necessary for a believable performance. Polar Express is an example of how disturbing inappropriate eye movement can be. Im happy to say that ILM did a spectacular job of eye capture in Dead Mans Chest.
One of the most effective refinements of the standard MoCap approach comes in the form of Stretchmark, a software system being developed by Pendulum. Its designed to use high-definition MoCap data in an innovative way to produce highly realistic character performances.
Artist/animator/engineer Robert Taylor is a co-owner of Pendulum, an animation studio down in San Diego. Taylor recognizes the need for more and better character animation in ever-shorter time frames. We dont use a model of underlying muscle systems because Ive seen that tried and so far I havent been so impressed. We take a more global approach using the captured data stream to control a custom set of blend-shapes. We did a lot of trial and error work to come up with about 45 base blend-shapes for the human face mesh. This base group can be blended to reproduce a huge variety of emotional expression.
We like a lot of data; for example, Mark Anthony (an impressive demonstration of concept) had 90 data points on the face. The work was done at House of Moves using the latest Vicon system. First, we apply a data fit to the mesh using our Stretchmark software and then run the Cap data to see how it goes. Its never ideal, so we then we apply what we call a corrective ID. The base blend-shapes cover most of the performance, but we will see breaks in the performance because the base blend-shapes couldnt cover a particular expression. Our sculptor then creates any needed custom blend-shapes to cover those. Its a quick process taking only about a half-day or so. With the new shapes the performance will be smooth and complete. We have a lot of tweaking tools that we can use to then customize the animation. We can also control the weighting or influence each point has on the mesh. We can add multipliers to get cartoonish behavior, or we can even modify the final performance using puppeteering tools or hand animation to tweak the performance. We set it up that way because were animators, and we love to tweak the final performance.
Some New Approaches
MoCap is the basis for all the tech-animation approaches. One interesting and effective new approach is Face Robot, which is available from Softimage. Its a software approach that sits on top of Softimage XSI, bringing you that remarkable set of tools, making it a full-face animation system. You can do everything from sculpting to final render with this setup. Face Robot is a high-end stand alone product that incorporates Softimage within it, explains special projects manager Michael Isner. Our solver has two inputs: the MoCap, and our soft tissue model. This is the first face performance software that contributes to the performance. Our soft tissue model is tweaked in cooperation with the director and art directors input, to reflect what they want from the face movements. Its an artistic process. The artist works with what I describe as soft IK in a jelly fish. We use a Wizard to create the core model of the face and how that underlying jellyfish will behave. Then we attach the MoCap data to specific points. The beauty of it is that we use a small number of markers because all the in-between portions of the face move in a life-like way on their own. Theyre not locked into a hard set of rules nor are they dependent on static morph targets. The facial expressions are more dynamic, more life-like than you get by any other means.
In a sense, the artist creates a face personality that will create a look and performance that is original, though based upon a face actors performance. Ive seen some early attempts at using underlying muscle structure etc. to help create emotional expressions before, but none of them have been very impressive. Its the uncanny look of slightly inappropriate movement that kills it. Face Robot doesnt have it down perfectly yet, but it is certainly a unique approach that is capable of giving fine performances and saving a lot of money in the long run.
One of the most technologically cutting edge vfx and animation houses in the U.K. is Double Negative. I spoke with Paul Franklin about the DoubleNeg character animation pipeline. Over the past couple of years Double Negative has been developing a new facial motion capture system. We've been collaborating with another U.K.-based company called Image Metrics where they have devised a unique method of analysing video footage of human faces in motion. This analytical process produces very detailed data, and the team at Double Negative has worked out how to take this very dense, abstract data stream and plug it directly into our proprietary character animation pipeline.
I had heard that their proprietary system uses sort of a biological substructure of virtual muscle and bone that is controlled by the dense Image Metrics data stream. Yes, the key to this system is an understanding of how the underlying muscle groups in the human face combine to produce recognisable expressions and emotion; by sculpting the shapes resulting from the contraction and relaxation of the muscles in unison a powerful character rig can be built with a minimum of animation controls. The same philosophy informs the approach to analysing data generated by the video capture so there is a bridge between the two ends of the process.
Id also heard that the Image Metrics approach to MoCap, called CyberFace, doesnt require all those nasty face markers. Way back some time ago, Famous Faces had a system that would track facial characteristics without markers, but they never really went forward with it. I thought it a great idea at the time. Image Metrics has definitely taken this approach to a high level. Yes, perhaps the most striking aspect of the whole methodology is that the capture sessions are marker free -- the actor only wears light makeup so as to emphasise various key facial features, Franklin adds. This provides a major advantage over other techniques where the large numbers of markers placed on the actor's face can often prove to be an unwelcome barrier between the director and the performance. We have also worked out how to run the video capture simultaneously with standard full-body optical marker based capture so we can record the entire performance at the same time.
Despite the process being marker-free it produces a very detailed recording of the subtleties of a performance. This is down to the fact that rather than sampling discreet points on the human face and then interpolating the missing data; our technique records continuous moving shapes taken from the eyes, mouth, cheeks etc The analytical process then relates this detailed, yet localized, data to a comprehensive database of human expression, generating the animating muscle combinations that went into making the recorded shapes. This animation data then goes directly onto the same character controls used by our animators when they are keyframing a performance from scratch. The data can be left in its raw state for an unedited version of the performance or it can be worked on using a suite of in-house tools that allow our animators to use as much or as little of the captured performance to build the final character.
One of the main complaints with MoCap is all the rigmarole that goes with it. Ive heard actors and dancers complain about the costumes and markers, crew can get frustrated setting up multiple capture cameras and in general MoCap is a pain way below the neck. I asked Franklin if any effort or thought had been put into making the system easier on the director, crew and actors: One of the most exciting aspects of this new approach to motion capture is that it eliminates a lot of that complex preparation generally associated with the whole field of performance capture. It also removes many restrictions placed upon performance -- previously actors might have to keep their heads completely still or be confined to a very limited area on the stage. At its most simple this process can work with a single video camera with basic lighting. Adding more cameras and higher resolution enables more coverage and looser framing which in turn allows performances to flow naturally -- another barrier removed between the director and the drama.
The ability to capture a face with one camera sounds remarkable. I assume the software provides the 3D movement. In that case since you dont have 3D position tracking. Of course, in most cases they use multiple high- resolution cameras. Franklin says theyre currently using this technology on several high profile movies and he believes the results will be spectacular.
I have a lot more research to do on Animal Logic, but I just had to mention them here. They were greatly responsible for the spectacular virtual performances in Happy Feet. Audience appeal of this movie has been spectacular in no small part because of the amazing dance sequences, fluffy stars and cute personalities. I understand that the crowd scenes were primarily animated using new proprietary crowd animation software known in-house as Horde, which takes multiple performances with variation and then randomizes them further through time and space warping. Very naturalistic crowd scenes were made possible through this tool. More procedural and cycle-based crowd work was handled by Massive, which also must be mentioned. More on Animal Logic down the road.
Ive written about Massive, the intelligent crowd simulation system, on several occasions. Its so good it floors me. Briefly, it uses artificial intelligence and virtual perception to select appropriate MoCap sequences, blends them and attaches them to characters in a highly realistic way. The result is that sequences animate themselves. In speaking with the Massive team at Rhythm & Hues, I discovered that there is always a ton of excitement when they gather to review the final render. They have no idea what the characters will do, but are always amazed at how flawless it usually looks. Massive is available for licensing if you need smart, believable crowds.
However, in chatting with Stephen Regelous, founder and product manager at Massive, he suggests that the application of AI need not be limited to crowd behavior. After all, each individual crowd character has to perform in believable ways. Blend-shapes could be scripted within Massive Prime, to emulate believable face expressions and emotions. Last year he implied that he had an interest in pulling high quality intelligent Hero performances out of Massive. Imagine using artificial intelligence animation software with Hero characters and getting believable performances. Having seen what Massive can do, I believe its not far off. In fact, I spoke with R&H Massive supervisor Dan Smiczek, who says: Its all there, built right into Massive Prime. It can handle very highly detailed face models using blend-shapes. You can get extremely fine animation details like eye blinks and impressive emotional expression. The really neat thing is that the blend-shapes are controlled directly by the AI. So you can have like one character yell at an other character and the one yelled at will hear that and react appropriately, say with jerk and a nasty face. I asked Smiczek if theyd been using Massive in this way and he adds: Not yet. R&H has its own outstanding face animation software that weve been using, but it can be done.
I know Regelous has had his sights set on intelligent automatic Hero animation for some time. He built an unexpected amount of face animation capability into Massive and many users dont have the slightest idea of how powerful it is. You can develop an extensive library of FaceCap data sets and then script the Hero character, using the built in visual and auditory perception, to react. Remember you can also hand animate on top of this, if you like. I suspect this will shortly become an area of heavy use as studios learn that Heroes too can be intelligently autonomous.
I think creepy eye movement has bothered me more than anything else in the virtual acting world. However, I think the guys at ILM have finally captured the holy grail under the brilliant leadership of supervisor John Knoll, R&D director Steve Sullivan and director Gore Verbinski. Davy Jones is a kind of hybrid character created by English actor Bill Nighy and a vast team of amazingly talented tech-animators and a director with the eye. In a sense, Davy Jones is probably the most advanced case of digital makeup ever conceived. With a ton of innovative approaches that Im still exploring, Verbinski & ILM managed to extract and compile perhaps the most perfect virtual performance in history to date.
Jones is the character with the beard made of octopus-like tentacles. What you see is virtually all digital: the entire performance. Its lively, exciting and realeven magical, but its all virtual. Or is it? I honestly dont know how to classify it. Jones is a MoCap/keyframe hybrid because even though he is virtual, Nighy is fully represented in the performance of that digital makeup, which is tracked to the actor throughout the action. Can you say that virtual makeup gives a performance? I think we have to in this case. And the eyes -- even in the close up Im convinced the eyes are really Nighys, theyre all digitalamazing.
Meanwhile, the tentacles were animated using an articulated ridged body solver and flesh simulation developed by Ron Fedkiws team at Stanford. Nevertheless, Im told much of the performance was keyframed and tracked flawlessly to the actor. But what techniques they used to get it this perfect is beyond me.
ILM had to develop an innovative way to track the virtual makeup onto Nighy as well. It was done, not in the typical greenscreen environment, but live on set and on location during a regular shoot, surrounded by actors in costume. Nighy had to wear something like a black-and-white checkered tracksuit with a skull cap and headband for tracking. His face had tracking dots and black rings around his eyes, so he looked a bit peculiar. He was out there in the water and on the beach, acting with the other players, who were resplendent in their wonderful costumes.
Hank is part of the Artificial Actors project at the Filmakademie Baden-Württemberg. © Filmakademie Baden-Württemberg.
It was necessary to track the virtual makeup to Nighy with great precision. To do this, Knoll, Sullivan and the R&D team developed what they call the Imocap process. Each performer is put in a suit with special tracking marks on it. Two high-resolution witness cameras are positioned on either side of the film camera. Parallax data is then used to triangulate positions of the special markers. Knoll credits the R&D team for coming up with remarkably clever software capable of reconstructing skeletal motion from the relatively straightforward triangulation data. What makes it even more amazing is that it was all done on location without the normal sources of power, often in knee-deep water. The entire system is portable, rugged and clearly robust.
Id like to add a few words about Fedkiw, the Stanford professor who advises ILM on deep technology issues, often providing tools used for some of their amazing vfx and animation work. Fedkiw and his graduate students are using some of this virtual human simulation in very interesting ways. For example, one of his students has been working with surgeons from Iraq in designing facial reconstructions. They often work with the medical school at Stanford. Interestingly, it was this work that lead to the development of some remarkable technology for virtual acting. Its based on biometrics. Fedkiw says: We actually built a model of the human head using MIR images that give us a look at the heads internal muscle structure. They come out a little bit warped and we had to correct for that. We also acquired the Visible Human data set to help refine it. We modeled the inside of the human head with all the muscles and bones etc. Working with Motion Analysis, they developed some trial motion data streams. Using Motion Analysis acquired SIMM biomechanical animation software, they linked up the MoCap data to their virtual head. Motion Analysis used about 200 markers to develop some hi-res MoCap data. The data controls how much and in what way the virtual muscles of the head respond, yielding facial expressions that track the face actor with remarkable accuracy. This biologically accurate functional 3D model of the head shows great promise both in medicine and in entertainment. It is the most technically and medically robust physical head simulation Ive seen, and perhaps that is why it works so well, when others have failed. Those of you, who thought Fedkiw and his team only did fluid dynamics, think again.
Something for Everyone
There is actually a sophisticated free facial animation system available for download. Its being used professionally by production companies and its worth taking a look at. Volker Helzle and I have had coffee and chats on several occasions over the past few years as I followed the development of a very interesting facial animation system using an underlying blend-shape library with some 65 control sliders capable of creating virtually any emotional expression you can imagine. This one reminds me a little bit of the Pendulum system, and yet its very much its own system. This is part of the Artificial Actors project of the highly respected Institute of Animation, Visual Effects and Postproduction at the Filmakademie Baden-Württemberg in Germany. The documentation is available in English. It comes as a remarkable tool set that you can download here: "http://aistud.filmakademie.de/actor/88.0.html" at no charge. Dont think that because its free, its not a very valuable tool. Its been developed at enormous expense by top engineer artists, and I think youll be impressed. Helzle tells me he would very much like to have you join in their development effort by downloading and using the tools and reporting back with suggestions and complaints. I cant tell you about the latest developments because they havent been announced, but this is cutting edge stuff.
Peter Plantec is a best-selling author, animator and virtual human designer. He wrote The Caligari trueSpace2 Bible, the first 3D animation book specifically written for artists. He lives in the high country near Aspen, Colorado. Peter's latest book, Virtual Humans, is a five star selection at Amazon after many reviews.