In the second part of our Polar Express production diary, animation supervisor David Schaub chronicles the performance capture and "MoCap/Anim" processes.
This is the second of four installments in VFXWorlds The Polar Express production diary. Read Part 1. If you have the QuickTime plug-in, you can view two clips showing the performance capture process by simply clicking on the images.
In the first installment, I took you through the early days of production, and the events that lead to director Bob Zemeckis decision to execute this film using motion capture.
In the next two installments, I will focus on performance capture and the work animators do on top of that, which I am calling MoCap/Anim. The last installment, to be posted upon the release of the DVD later this year, will focus on the sequences that were keyframed.
Before getting down to the details, I should try to answer a common question: Why motion capture? Why not shoot live action? Here are a few reasons:
1) Achieving the painterly stylization of the Chris Van Allsburg book was the primary objective behind making this film. Remaining within a CG world allowed us to reproduce that look with a great level of control.
2) The director can focus on performances and not be burdened with the normal stresses of a live-action shoot (sets, lights, the weather, etc.). The feedback is instant, which would not be the case if we were to animate traditionally.
3) The actors can focus on acting (or reacting, as Tom Hanks describes it). It is like performing in-the-round without the worry of hitting a certain mark or acting to a particular camera.
4) The camera work is done after the shoot on a virtual set, where the director and DP can spend as much time as they like crafting the perfect camera move. As a result, there are some physics-defying camera moves in this film that could never have been achieved through traditional means.
5) The director no longer needs to settle on a single performance take -- he can mix and match the best moments from multiple takes. This gives him complete control over the performance, in that he can mold the performance to his liking.
There were a myriad of other benefits that Bob appreciated with this process, including:
Whats great about doing movies in performance capture is that you spend only 20% of your budget and then you know what youve got before you spend the 80%. When you make a live-action movie you send the director and a bunch of actors off and they spend 80% of the money and come back and you ask, OK, do we have a movie here or not? So this was very controllable and very responsible.
In essence, we were able to create what was almost a live-action story reel, not so dissimilar to what one would have in more traditional animation, with the goal of locking down creative, story and technical elements. It also allowed Bob to incorporate specific performance-takes into the reel before committing to the full MoCap/Anim process for that shot. That is, it eliminated the need for traditional animation-blocking.
Performance Capture vs. Animation
It was the design of this production that dictated the style of animation. That style is driven by the performance of the actors, and Bob was very particular about keeping those performances intact. He is not billing this as an animated film because animators did not drive the majority of those performances. He has directed animation before (Who Framed Roger Rabbit?), but his vision for this film was something completely different. He is using live-action directing techniques, but rather than capturing the performances on film he is capturing those performances through an alternate means. With Tom as his lead actor(s), Bob wants to see those performances translated through the technology to the screen. He does not want animators to reinvent, reinterpret or otherwise spin their own take on Toms physical performance.
The reality is that animators must be involved in the process. We found that performance capture could deliver 70-80% of the performance, so the result on screen is really a hybrid between the two worlds. There were also cases where Bob requested performance changes, and those changes were executed in animation rather than recapturing the actor on stage. Sometimes the changes were so extreme that the shot had to be reanimated from the ground up.
Our animation crew consisted of very experienced animators -- from Disney veterans (both traditional and CG) to some of the best talent in the stop-motion world. As animators, we are sensitive the value of caricature and exaggeration. These are the cornerstones of our world. This project was different and called upon the animation team to accept a new method and style of bringing performances to the screen. The standard rules and techniques did not apply here, but the old dogs did indeed learn new tricks. It took the keen eye of our animators to quickly zero in on the most important elements that the technology missed, and deliver the best possible result within the allotted time and budget (which was tight).
I was approached by VFXWorld to document the history of this production in the form of a journal. However, a sequential chronology of events is difficult to follow due to the multitasking involved from day-to-day. Since each subtask boils down to a mini-production in itself, I organized the journal entries by subheading. All subheadings in this installment fall under the category of performance capture and MoCap/Anim. I hope this will make for easier reading and allow you to focus on the areas that you might be interested in. Keep in mind that this is not a complete picture, but a personal account of my days and nights on this project. I would like to thank the following leads for helping me round up and organize this material: Jeff Schu, Keith Kellog, Kenn McDonald, Chad Stewart and Ron Fischer.
Bidding was an ongoing process throughout production. The initial bids were based on story reels, and then got more refined as the shots became more clearly defined. The MoCap/Anim bids were a joint effort between Albert Hastings (integration supervisor) and myself. For shots where characters were a moderate distance from camera, most of a shots budgeted effort was expended in the integration department. For shots where a character was close to camera, there was a larger proportion of effort in the animation department. This allowed us to focus animator effort on high-value moments.
Every effort had to be made to drive the cost down, so each round of bids was accompanied with a list of simplification proposals. From this list Bob would determine which simplifications he could live with. Some suggestions were simple, such as frame this shot so the background kids are not in view. Other simplifications had greater creative implications like proposing that the Scrooge puppet be made entirely of wood (solid body), rather than having to simulate the cloth nightshirt as it was in the original design. The simplifications and re-bids were numerous, but it ultimately resulted in bringing the cost in on target.
The Crew (Animation)
Based upon our bids it was determined that 30 animators would be required to pull this off in the course of 16 months (three sequence leads and 27 animators). As sequences were added (and complexity increased), the animation crew grew to five leads and 40 animators.
Animation dailies were held each morning at 8:30, when I would give notes for all of the works-in-progress. Following dailies I would meet with animators and their leads to go over problem shots that needed more attention. Patrick Ramos (coordinator) would manage all of the notes and keep track of all shots in play.
Meetings were scheduled back-to-back through most of each day (production meetings, bidding, casting, simplifications, turnovers, etc.), and I relied heavily on the leads to work with the crew in my absence. I managed to get rounds in before the afternoon sweatbox with senior vfx supervisors Ken Ralston and Jerome Chen. It was here that we moved shots forward from integration to animation -- and animation moved forward to Bob for approval (or not). Several hours of each day were reserved for the keyframing tasks, which I will elaborate on in the fourth and final installment:
1) The Ticket Ride sequence, which includes the flyaway ticket, wolves, eagle, eaglet and, at one time, a family of bunnies, dancing bears and a beaver.
2) Smokey and Steamer (the engineer and coal-guy), who were voiced by Michael Jeter but animated traditionally from the ground up.
3) All of the other animals, including the caribou and flying reindeer.
4) Puppets in the broken-toy car (specifically, Scrooge), and an elaborate shadow-puppet sequence that was ultimately cut from the film.
The ShootJanuary-April 2003
It was a technical metropolis on stage, with 50 gigs of raw MoCap data being pumped into our facility each day. My main concern was that we were getting good video reference. This was our security blanket. I knew that if the MoCap technology failed for any reason we would still have solid video reference to animate to if it came down to it.
The Eyes: A critical problem area was the region around the eyes where the markers would often become occluded under the fat pad between the lid and the brow. The motion capture was turned off in this area and, instead, animated rotations of the eyebrows were created. During the shoot, my days consisted of keeping an eye on the video footage. There were as many as 12 video cameras rolling to record the action from as many angles as possible, including three roaming cameras trained on the actors faces at all times. I had a communication line from the video control booth to the camera operators on set. That way I could request a different angle, or a tighter face shot for example. There were typically many takes of the same scene, and with each successive take the operators were able to zero in on the important elements as they became more familiar with the action. By the time Bob was happy with the performances, we also had some pretty refined reference as well.
Several teams of technicians gathered and logged the media. This included digital audio files, video reference and body and face tracking data. All of these media included a master timecode clock signal, which ran continuously during each day of shooting.
Animation Rigs & Controls
The last installment defined the need for specific controls in our animation rig. What we ended up with is a complex character rig with multiple skeletons. Standard animation controls are the FK and IK skeletons. Snapping tools allow us to instantly align any (or all) joints of the FK skeleton to the IK skeleton and vice versa. The motion capture data is applied to the motion capture (MC) rig, and the FK and IK rigs can also be snapped to the MC rig at any point during a performance.
The upshot of this is that the animator can let the motion capture drive the performance on the MC rig, then at the desired frame snap the IK rig into alignment with the MC rig. The animator can animate from that point forward on the IK rig (by blending to the IK rig as the driver). Once the keyframed portion of the performance is complete, the animator can blend back to the motion capture on the MC rig. The snapping tool also allows us to snap the animation rig (FK or IK) to the MC rig on every frame, or at predefined intervals (on sixes, for example) and set animation keys automatically, if desired. This way we can transfer cryptic motion capture curves on to more familiar IK controls and manipulate the action through a more logical means.
There is also an offset-rig that allows the animator to animate on top of motion capture without turning it off. The original intent of this tool was to push the MC joint rotations to produce broader actions and stronger poses. When it became clear that performances were to be replicated precisely, the offset tool was used for more subtle tasks such as offsetting the characters hand to align with a prop in the scene, for example.
The tools were designed to accommodate almost any situation that we could anticipate. The reality is that every new shot in the door seemed to present a new challenge. However, with these tools the animators were able to solve most problems that came our way. The animators also found many other creative workflow techniques beyond those described here.
Development work on PFS (Performance Facial System) continued well into production.
The anatomical layout of the muscle system in PFS was unwieldy at the outset. There were over 300 individual muscles in the face, each with its own animation control. That means that each muscle channel only affects a small localized region of the face. This is how all facial work was animated during the first 10 minutes of the movie, and it was an extremely cumbersome process. Over time we were able to build pose libraries for the face shapes. This was accomplished by capturing poses from motion capture, then refining those poses with the animation controls in PFS and saving them in the pose libraries.
Face Rig: Development work on PFS (Performance Facial System) continued well into production. There were more than 300 individual muscles in the face, each with its own animation control making for an extremely cumbersome process. The EyesFebruary 12, 2003 A critical problem area was the region around the eyes. The actors had markers on their eyelids, but those markers would often become occluded under the fat pad between the lid and the brow. We would occasionally see some good data in this region -- but it was altogether unreliable, since the area around the eyes is obviously very noisy and unpredictable. This was also a problem in our original test -- as documented in the last installment. In that case, the artist literally animated the muscles around the eye to produce the desired shape change for each movement of the eyeball. The problem wasnt going to go away, so a better solution was needed. It was decided to turn off the motion capture in this area and develop a system that would use the animated rotations of the eyeball to drive the shape-changes in the flesh around the eye.
Procedural Eye ShapesFebruary 21, 2003
I was able to define the desired shape-changes around the eye for all of the extremes (up, down, right, left) with video reference and photographs. Using the eyeball rotation values, senior character setup TD Eugene Jueng took a stab at driving the shape changes procedurally with PFS muscle values. He was able to get good results for extreme left and right because the shape change is a simple lateral pull of the muscle. However, we found that there was no way to achieve the desired shape change for the extreme up and down poses with muscle values alone. These shapes would have to be modeled.
March 4, 2003
Lead modeler Marvin Kim set out on the task of modeling the extremes based on the references that I provided. He was able to test the extremes and make small adjustments on-the-fly by integrating the models as blend shapes driven by a rotating eyeball.
The solution would be much more complicated in the rigging world because of all the other controls and procedural elements that tie into it.
March 13-April 3, 2003
Senior character setup TD Michael Laubach joins the effort, and over the course of several weeks he is able to get all of the controls, shapes and procedural elements working together. While the automation works beautifully, the animator still has the ability to override the automation -- which is standard in all of our rigs at Imageworks.
Note: Another problem that we faced throughout production was that eyelines in the final renders did not always match the renders approved out of animation. We discovered several reasons for this phenomenon, but in the end the surest way to correct the problem was to make final adjustments to the eyes after evaluating the lit version. We took the opportunity to make appropriate adjustments on the most significant shots, but did not have the luxury of perfecting every one.
The original motion capture test (as described in part one of this series) dictated the need for a dedicated department to process all of the digital assets created during the shoot and create official shots for the downstream departments. The integration department was established by Alberto Menache, and headed by Albert Hastings. There were five integration leads, and 25 integrators at full capacity. In the Polar Express pipeline, the integration department performs the integration task twice. It is done once roughly (without facial data) to pre-visualize and frame shots and then again as a final integration (with facial data) once the camera has been finalized. This approach limits facial tracking to the material used in approved shots, since the numerous face markers take much more time and effort to process than body markers.
Rough IntegrationFebruaryJune 2004
To prepare for rough integration, the production editorial department loads all the reference video into their Avid and reviews takes chosen on set by the director. Particularly tricky was that multiple tracks were running in parallel on their Avid. The video tracks were not only alternate angles of the same performance, but alternate performance takes as well (each with 12 video tracks). This process gave the director absolute control over the performances. No longer does Bob need to settle on the best performance take, but can mix-and-match the best performance elements from multiple takes. For example, our Hero Boy would be in track 2 for most of the shot, but then switch to track 5 for the delivery of his dialogue (which may be an alternate take), then switch to video track 8 (performance take 3) for the rest of the shot.
To accommodate the directors workflow, Imageworks developed a system to track editorial lineups to a new level of detail. Prior to Polar Express the facilitys work was often done against background plates, and frame counts would always match footage to layers of animation and effects renders. This show presented an entirely new challenge. None of the material captured on stage was considered a background plate, and the editorial lineup for a single shot includes multiple cuts within the action. A tool was developed to display lineups in an Avid-like timeline, and would reorganize the raw video tracks into separate tracks, one per character. The software tracks those lineups, and produces video reference planes visible to artists working on assemblies and shots in both Kaydaras Filmbox (now Alias MoCap) and Maya.
Next, the motion capture data is applied to our low-resolution character rigs in Filmbox, lined up to timecode and cut to produce a rough assembly that matches the video edit called a Performance Assembly. The Performance assembly is theater-in the-round and includes the entire action of a scene, with no attempt to finesse the animation between selected takes. The only additional animation added might be some rough prop animation. Since the cameras have not been defined at this stage, all characters must be processed and brought into our virtual set.
A team of animators jump on board to assist the rough integration process. This enabled us to get more work through the pipe, but it was also clear that animators would need to understand the tools that integrators were using in MotionBuilder. The package has some very handy retargeting and blending tools that would be extremely valuable for the kinds of tasks we would be handling in animation as well.
Once the camera views and shots are defined in layout, the integration department rebuilds each of those shots from the ground up. It is essentially the rough integration process again, but this time with much more finesse and attention to detail. Since the camera has been defined we know which characters can be dropped out. For example, if the camera cuts to a closeup of one of the kids, all other kids in the virtual set can be eliminated from that shot since they are not seen.
It is the integration departments responsibility to extract as much useable data from the performance capture sessions as possible. The goal is to be faithful to the original performance with no further interpretation. For this, the integrators used the unique animation layering tools in MotionBuilder to match the characters to the set. The non-linear motion editor was invaluable in stitching and blending together different motion takes. Another problem is that the characters often had a terrible case of the shakes. This was due to noisy motion capture that was especially noticeable in the quiet moments of a performance. Thankfully, the noise is disguised when the performance is broader and more active. Several filtering algorithms were developed to alleviate the problem, and for each noise condition there seemed to be an effective filter (or combination of filters) that would resolve the problem.
While body-tracking data was delivered to integration on skeletons in the proportions of the actors (which had to be retargeted), facial data was delivered as stabilized 3D tracks with neutral face poses attached (default pose for that character). The integration department solved this data onto the character facial rig. Over time integration developed a secondary system for adjusting the facial data during application. The need for this fine-tuning declined toward the end of production.
Once the integration department has combined all of these media files based on the editorial departments edit list, the scene is exported to Maya for delivery to the animation department.
Animation director David Schaub joined Sony Pictures Imageworks in 1995. On The Polar Express, he worked with director Robert Zemeckis and noted senior visual effects supervisors Ken Ralston and Jerome Chen. He was previously a supervising animator on Stuart Little 2, for which he and his team received a VES Award (Visual Effects Society) for "Best Character Animation in an Animated Motion Picture." His other film credits at Imageworks include the Academy Award nominated Stuart Little, Cast Away, Evolution, Patch Adams, the Academy Award nominated Hollow Man, Godzilla and The Craft.