'The Polar Express' Diary: Part 1 -- Testing and Prepping

In the first of our four-part series, animation supervisor David Schaub chronicles the process of bringing Robert Zemeckis' The Polar Express from conception to the big screen.

The Polar Express animation supervisor David Schaub.

With The Polar Express, based on the popular childrens book by Chris Van Allsburg about a magical journey to the North Pole, director Bob Zemeckis, star Tom Hanks and the film crew achieved a new kind of live action/animation hybrid as a result of breakthrough performance capture technology from Sony Pictures Imageworks called ImageMotion.

VFXWorld asked me to document my account of this production from my perspective as the animation supervisor. Thus, my focus will be on the animated portions of the film, along with brief insights into other areas of production. Virtually every element of live action and animation applied. From an animators point of view, I was drawn to Polar Express not only by the prospect of exploring a new process, but also by the opportunity to create a wide range of animation performances using more traditional means as well. All of the animals, the train, flying reindeer, elves, puppets and more would be animated by hand.

In this first installment, I will walk you through the earliest days of production, and the series of events that lead to my involvement.

The Polar Express presented a rare opportunity to explore something completely different. At the heart of this equation was one of the most cinematically adept filmmakers of our generation, Bob Zemeckis. With a history of applying talent and technology to find new ways to bring stories to the screen, this project promised to be an adventure.

Part of the magic of the book was in its painterly illustrations. From a production design standpoint, Zemeckis was intent on maintaining the integrity of the books artwork. If he were to do this as a live-action film, he knew that he would lose the paintings and with them, much of the magic. It was the magical spell of the book the sense of mystery and wonder so artfully conveyed on the page that he wanted to retain. But he had to find a way to bring the paintings in the book to life, yet had no preconceived notions about how that look would ultimately be achieved on film. At first he was striving to replicate the brush-stroke imagery of the Chris Van Allsburg paintings, but that look ultimately gravitated to the more refined interpretations of those paintings that production designer Doug Chiang was producing.

To explore his painting-in-motion idea, Bob turned to Ken Ralston, five-time Oscar winner and senior visual effects supervisor at Imageworks, who has been Bobs vfx guru as far back as Who Framed Roger Rabbit. Ken made some early explorations applying 2D approaches with vfx supervisor Sheena Duggal. The first test used a combination of image processing tools and customized brushes that allowed an entire image to be treated as a pastel painting. Another approach used optical flow software to create motion maps for each pixel in a series of live-action images. This motion analysis was used as a base to create time-coherent painterly effects over live-action sources. Artists would paint directly onto selected single frames. These paintings were used as keyframes, and the optical flow software applied motion to the paintings over time. The areas of the image that did not work using optical flow out of the box were manipulated in the composite, using a combination of 2D painting tracking and warping. The results were very promising, and development efforts in this area continued to move forward for many months.

Ken Ralston, five-time Oscar winner and senior visual effects supervisor at Imageworks, and production designer Doug Chiang. Ken Ralston Photo courtesy of Sony Picture Imageworks. Doug Chiang Photo by Giles Hancock.

CG supervisor Alberto Menache suggested motion capture as another possibility for acquiring human performances. If we were to take a 3D approach, the desired look could be achieved with great control and flexibility. There was a lot of initial skepticism at the time, but nothing could be ruled out at this early stage. Alberto had extensive experience in motion capture (he wrote a book on the subject), and developed a motion-capture pipeline for crowd systems on Spider-Man at Imageworks. Ken agreed to a test, side-by-side with another bluescreen test that was getting underway. Alberto focused on software development while Scott Stokdyk was appointed as vfx supervisor for the test itself.

Both tests (2D and 3D) consisted of eight shots. For the motion capture version of the test, Tom was to play the child as well as the conductor. Knowing that MoCap would only be the front-end of the process, I was asked to supervise the animation work to whatever extent was necessary. Retaining the Toms performance without reinterpretation was of the highest priority. Prior to this show, I had a small amount of experience with motion capture. I knew enough to be skeptical and a little fearful of the results that could be achieved using this method but I was ready for the ride.

All of the innovations that we made along the way came about as a direct result of Bobs vision and desired workflow. He came to us with his ideas, and it was our job to find a way to make those ideas a reality. From the photorealism and performance of a Stuart Little to the grace and power of a Spider-Man, Imageworks is a company that is constantly pushing the boundaries. Here was another opportunity to jump way outside the box.

The bedroom stage set-up. All Polar Express images © 2004 by Warner Bros. Ent. Inc. Previously published in The Art of the Polar Express published by Chronicle Books.

July 10, 2002

We had been exploring options and strategies for several months now and were ready to conduct The Big Test. Three sound stages were set up, each employing a different technique. If we were going to make this movie, the way we would make it would be determined as a result of this test.

The first stage followed traditional filmmaking rules with a set filled with costumed actors, props, sets and lighting. The next featured costumed actors in front of a greenscreen where everything else would be added as a visual effect. The final was largely empty except for the motion capture equipment. Tom was outfitted with face markers and a spandex MoCap suit. For the purpose of this test, we used existing motion capture technology. He delivers his master performances first (on a large stage) then goes back and re-delivers the same performances closeup (ADR-style) with careful direction from Bob to be sure that his facial performance is in sync with the body-language already captured. This is a painstaking approach from a performance perspective, and leaves very little room for the actor to improvise.

We all realized that there was a long road ahead, and that these are just the first steps. We had to produce a series of shots (through whatever means) so that the look of the painterly-world could be explored on a CG canvas. Performance methodology at this point was secondary. If the test succeeded, then Imageworks would shift its focus to developing new technology for performance capture. It was too early to know what was possible, or where this was all headed.

Remember that the purpose of the test was twofold: one was to determine which technique produced the best results for the artistic vision of the project and secondly, but no less important, to determine what aspects of that technique worked and what improvements were necessary to affect a viable production pipeline. We knew that we were operating at this point with existing available technology and it was as important to us to recognize the present shortcomings.

July 20, 2002

The motion capture for the body was tracked by the MoCap vendor and put in the capable hands of Albert Hastings, our integration supervisor. Using Kaydaras FilmBox, he applied the data to our skeletons, and the adult Tom was retargeted to the physical proportions of an eight-year old child in the shots that called for it. From here all of the shots were prepared for animation by upgrading the FilmBox rigs to the animation control rigs in Maya.

The animation control rig seemed sufficient at the time, at least for the task at hand. Each character had two control skeletons one that carried the motion capture data, and one that carried the animation. There was a blend control that allowed animators to mix any combination of the two, including 100% of one or the other. At any point, an animator could blend out of motion capture and keyframe a section of the performance (if necessary), and vice versa. On the surface it was very logical, and I thought that this would allow us to get through most problems that I could anticipate. Three animators were crewed to support this test: David Earl Smith, Keith Smith and Alex Tysowski.

An early conceptual look of Tom Hanks as the conductor.

August 5, 2002

Like most endeavors of this sort, things rarely work the way they are supposed to right out of the box.

Since we are capturing performances with existing motion capture technology, we find that the data is very noisy and needs additional filtering. However, when we try applying a noise filter, it results in losing subtle nuances in the acting. Those nuances will have to be animated back into each of the affected shots (using video reference shot on stage). We find that very little movement was captured in the upper torso, as the MoCap solver seems to have lumped all of the subtle spine rotations into the pelvis. Therefore, we needed to animate the spine to achieve a flexible organic quality. The MoCap solutions for head rotations are also very squirrelly. A simple head rotation around the axis of the spine was interpreted a wobbly rotation affecting all three axes. Then we find that the characters do not line up with environment and props in our 3D world exactly as they were intended. However, it would be the goal of the production to faithfully display the actors performance without substantial animator enhancement. We grind through the process using sheer brute force taking comfort in the fact that the test is only eight shots, and not impossible.

August 10, 2002

Alberto rolls out the new version of PFS (Performance Facial System). This is a muscle-driven face rig with animation controls for every muscle in the equivalent human face. The muscles deform the underlying fatty and ligament layers of the face, which in turn deform the high-res geometry that ultimately gets rendered. This is built upon the engine that was used to animate the cat faces in Stuart Little 2. In addition to manipulating the muscles through animation controls, Alberto designed the system so that the muscles could be driven from other sources as well, such as motion capture. For the initial test there were 100 markers on the face. Groups of markers are used to calculate the contribution of each specific muscle. At this point there are animation controls for each muscle in the face. The ultimate goal is to develop higher-level controls so that muscle values can be combined and saved as poses, similar to a blend-shape system. These poses can be built by animation leads, or captured from the performance and modified by animators to match the video reference. This is the first incarnation of PFS with MoCap used as an input, so there is much development yet to be done.

August 15, 2002

Face data is tracked and retargeted to the child characters proportions using the muscle build and behavior scripts that were produced by Alberto. Since the face and body performances were not done together, the face performance is pasted onto the body performance after it is brought in to Maya.

There is an impressive amount of movement coming through in the cheek area, which historically looks stiff in CG characters. The cheek deformations make the character look a little more pliable and organic. However, the movement captured elsewhere in the face is pretty minimal.

We are getting a decent amount of data to sell the jaw rotations, but the detail around the lips is missing. This is because markers cannot be placed inside of the actors lips where they are needed: Its too uncomfortable and distracting for the actor, and the markers would be occluded most of the time anyway (they would not be trackable). All of those dialog shapes (phonemes) will have to be animated by hand using the muscle controls in PFS. The markers on Toms eyelids are essentially occluded by the fat pad above the lids when his eyes are open. Because of this, the surface area around the eyes is very noisy and unpredictable. The solution is to turn off the MoCap in the area between the brow and cheekbone and animate the muscles in that region by hand.

Reference photo for Smokey and a conceptual drawing of Steamer.

September 5, 2002

One-by-one we have pounded our way through most of the shots. David Earl Smith was at it all weekend (and through the night) animating the muscles around the eyes on a conductor shot to support the eye-movement that he animated earlier since eye movements also cannot be captured. Our original hope was that the shape changes would come through in the motion capture, but they had to be turned off because of the occlusion problems and the noisy unreliable data that results. It is a painstaking process, because a shape change is needed each time the eyes move. We get through it, but the notion of accommodating performance changes at this point would make the task insurmountable. We could never get through an entire feature this way, so a better solution is needed. I remind myself that we are not talking about an entire feature at this point for now its just a test. Getting acceptable animation out to render is our objective and we would cross the bigger bridge later when (and if) it ever came to that.

Looking to the future, I begin to define the ideal tool-set that would allow us to make the best use of the MoCap data, and give animators the flexibility to make changes easily.

Snap tools to copy/paste poses from the motion capture rig to the animation rig (on selected frames, or every frame, if desired).
IK and FK MoCap-offset rig to animate on top of existing MoCap.
Spine-joint redistribution tools. Animator-friendly user interface for PFS and a fully functional library of face poses (rather than animating each muscle independently).
Animator-friendly user interface for PFS and a fully functional library of face poses (rather than animating each muscle independently).

September 19, 2002

We have succeeded in translating Toms performances with an impressive amount of detail. Bob is very happy with the animation and look that is being achieved. He abandons the 2D tests and commits to the performance capture approach. He wants to take this test to the next level that is, refine this test and pitch it to the Warner Bros. executives. Before he is willing to do that he wants some further modeling work done on our hero kid. After seeing the CG character in action, he thinks that he looks much too old. The character in the book should be about eight years old, but our kid looks to be about 14. He is much too tall, and the proportions of his face are those of a teenager. The character is moved back into modeling, where we knock about 10" off his height. His forehead is elongated, eyes are made bigger and the head is made bigger and rounder. I am concerned that the animation may not survive this upgrade.

September 24, 2002

We were reminded of the importance of precision in character modeling when we were asked to make structural changes after animation had been complete. As I feared, the animation data intended for the original model skewed the topology of the new revision. In order for the animation to translate correctly, the data (animation AND MoCap) would have to be retargeted to the new model. The process involves creating a new muscle-layout for the topology of the new face, requiring a PFS script that had not been perfected yet. With our delivery date of Oct. 8 looming, there was no time to go through this process and deal with any of the potential problems that may occur with such a technical solution. As this initial test was designed to explore the look and basic process (with the look being of primary importance at this stage), we steamed ahead with facial animation where necessary on the affected shots.

Body reference sheet for the children.

October 8, 2002

The test was a big success at Warner Bros. It took us three months to produce eight shots, and now there is talk of an entire feature within the course of 18 months! The idea seems overwhelming. The silver lining is that we know what the problems are, and I am convinced that we will solve each one of them by stacking one innovation on top of the next. Now all I have to do is find ambitious animators who are willing to tackle the enormous and unique challenges ahead.

December 2002

Jerome Chen teams with Ken Ralston as a senior visual effects supervisor for the show. Look-development continues and impressive breakthroughs are made. In the two months since our test and Bobs decision to make this movie, Demian Gordon and his crew refine the capture system to the point where both face and body could be captured in a 360º, 10' x10' volume. This was a huge leap forward. A single character inside the capture volume requires 64 Vicon cameras (eight for body plus 56 for the face). Development continues to the point where four actors are captured together in the volume with impressive results. To make this happen it only required an additional eight cameras (16 for bodies and 56 for the faces). Suddenly the idea of making a movie this way seems very feasible.

Of course, there will be some level of animation required on these characters, but the goal is that the bulk of the actors performance be delivered through the performance capture system. If the technology performs as expected (and demonstrated), then we will be dealing with a visual effects film where live actors are replaced with performance-captured characters. That will be our visual effects palette where we can add the traditionally animated characters, animals, effects and lighting.

At this point, I am enticed with many sequences that require extensive keyframed animation. None of the animals in this film will be motion captured. There will be wolves, eagles, rabbits, caribou, flying reindeer and, of course, the train itself. There is also talk that the elves might be fully keyframed as well. Over the next few months I am tempted by an increasing number of great animation opportunities, and happy to accept the position of animation supervisor on the show. The next two installments of this diary will document that journey.

Animation director David Schaub joined Sony Pictures Imageworks in 1995. On The Polar Express, he worked with director Robert Zemeckis and noted senior visual effects supervisors Ken Ralston and Jerome Chen. He was previously a supervising animator on Stuart Little 2, for which he and his team received a VES Award (Visual Effects Society) for Best Character Animation in an Animated Motion Picture. His other film credits at Imageworks include the Academy Award nominated Stuart Little, Cast Away, Evolution, Patch Adams, the Academy Award nominated Hollow Man, Godzilla and The Craft.