Bill Desowitz gets to the bottom of how and why The Polar Express is the next hybrid CGI breakthrough with senior visual effects supervisor Jerome Chen.
When Robert Zemeckis launched his digital center at USC four years ago, he sat at a computer monitor and confided that his dream was to someday shoot an entire movie, virtually. Well, he soon got his chance with Warner Bros. The Polar Express, the all-CGI Christmas extravaganza with Tom Hanks, based on Chris Van Allsburgs popular illustrated childrens book, which opens today [Nov. 10, 2004] in both standard 35mm and 70mm IMAX 3D. Along with Sky Captain and the World of Tomorrow and The Incredibles, The Polar Express represents a daring technological leap in CGI moviemaking. In wanting to capture the spirit of Van Allsburgs painterly look, Zemeckis chose to experiment with a new form of performance capture designed by Sony Pictures Imageworks and Vicon called ImageMotion. VFXWorld recently spoke with senior visual effects supervisor Jerome Chen about the unique challenges of making The Polar Express and what its significance means to the 3D community, which has already begun debating its technical and artistic merits.
Bill Desowitz: How did this process start?
Jerome Chen: [Senior visual effects supervisor Ken Ralston and I] were the creative and technical supervisors who were charged with coming up with a way of achieving this movie after a preliminary meeting with Bob, who basically said he didnt want keyframe animation. We was pretty sure it would have to be CG, since it would be hard to do this live action. And then he wanted to preserve the visual spirit of the book, the pastel drawings. So we even toyed with the notion of shooting live action and treating in post à la What Dreams May Come. You still have all of the effects plus on top of that an artistic process, which is really daunting.
BD: Were you already experimenting with performance capture?
JC: We had done motion capture for movies like Spider-Man and other movies at Imageworks. But let me make a distinction between motion capture and performance capture. We started calling it performance capture because we were grabbing the entire performance at once, meaning facial and body. The other stuff we had done for stunt sequences was for body performance. When I started looking into the state of motion capture at the beginning of the show in June 2002, it felt pretty primitive to us, meaning when they did motion capture for games and other action sequences, it was about the stunt, not about the facial performance, so you could either keyframe animate the face or grab a separate motion capture session where the actor sits still and mimes a facial performance, and a technical animator would glob the two pieces together. But in our movie we have these four children interacting with each other on this adventure, so it didnt make sense to capture everyone separately. So conceptually, what we needed was to create a place where you can get four actors together they can look in any direction at each other and you can record the performance.
That was the design spec. At that point, we contacted a number of motion capture equipment makers and talked to them about what he wanted to do. One of them told us it couldnt be done because there was too much data to capture, because we were going to use a full marker set on the face at this point. Nobody had done what we were talking about, which was really odd. Also a little frightening. One of the main problems we had to overcome was how could the cameras take in so many facial markers. Our system alone has 152 facial markers. What we ended up doing was working with Vicon to develop their software so that it could take in the amount of data that were talking about at a quality you could reconstruct without a lot of noise in the markers.
BD: So what was the breakthrough here?
JC: The breakthrough was coming up with the number of cameras and the configuration of the cameras and the pipeline after youve gathered the data to apply it to the characters. It turned out that that we needed 72 cameras to provide coverage in this capture zone so we could grab four actors and their facial and body markers together. So thats 152+48 markers per person x 4. I think thats 80 gigabytes per minute. Theres a lot of other technology that had to be created to manage it bookkeeping things for the data to be processed and visualized.
BD: What about the performance challenge?
JC: Part of what youre doing is capturing Tom playing an eight-year-old boy [along with four other adults]. So already you have differences in how much a muscle moves on Toms face in relation to whats happening on a childs face. We wanted to get a character that looked like Tom when he was younger. We started scanning his son, who actually looked more like Rita [Wilson, his mom]. But we also realized that we didnt want to make him into an actor. So we came up with a design that Bob liked, and as we started to apply motion to it; we made a couple tweaks so that the kids eyebrows and mouth looked a little more like Toms because the performance actually translated a lot better, because Tom has these really arched eyebrows and does a lot of acting with his forehead he doesnt move his face that much. Its interesting how we were able to analyze his acting in that way.
BD: Talk about how the production process was split into different phases.
JC: You have the performance capture first, then the integration process. After a particular performance take is selected by Bob and his editor [Jeremiah O Driscoll], it is sent to Imageworks and is ordered up. We then go through the process of applying the performance data to the digital character in a medium resolution. The digital characters are then placed into the virtual set and the props are put in. And at that point it goes into layout, which is similar to a traditional keyframe movie. So this is where we begin to talk about the point of view of the movie. What does he want the camera to be showing us? And one of Bobs trademarks is visual storytelling, so the camera is very important to him. And what was liberating about this digital process was he was able to concentrate totally on camerawork as a whole separate phase during performance capture. And you cant even begin editing yet because all you have at this point is video reference. So we created this process called Wheels where we brought in a real cameraman to teach computer animators how to act like cinematographers and it would feel like operating a remote camera on a gearhead as if they were on a technocrane. So the wheels basically allow you to pan and tilt and all were doing is recording the input from those wheels that will drive this virtual camera later, so you get all the nuances of Bobs camerawork played back in realtime. We didnt want to keyframe the camera, which gives you a different look.
BD: What other new technology did you have to create?
JC: We had to create smoke and snow and water and all of the effects animation. I think this was one of the largest effects animation crews that weve had at Imageworks. Traditionally smoke and water effects take so long to look correct in the computer, but because we were going for a more stylized look, we decided to create a new renderer called Splat. This was our smoke and snow renderer, and it was very fast. We used old technology to create some tests of smoke to be composited in, and these passes took 16-20 hours a frame to render. The simulation to get the smoke movement was done pretty quickly, but to render it you really had to like the movement because you only had one chance, and we had hundreds of shots that required smoke. So this new renderer would take 20 seconds, which was huge. So that meant the effects artists could do a lot of iterations of movement and lighting until we really liked it. I thought the smoke and all those effects, those subtle interactions, turned out great it was one of my favorite parts of the movie.
BD: What was different about your role here?
JC: It was hard but Ken and I got to do a lot of fun things because we werent just relegated to fit our imagery into the movie we got to make the entire image. We got to light it, we came up with a great look, we got to make decisions about character design and colors, and Bob was a great collaborator. In terms of the lighting, thats a touchy subject because the DP, Don Burgess, didnt design the lighting and a different DP [Robert Presley] shot it. So its interesting how everyones role becomes fractured. I dont know how to even define what a visual effects supervisor does anymore. There are so many different aspects.
BD: Particularly with the larger role that previs now plays.
JC: The interesting thing about previs in a CG movie is that the camerawork can become the shot. I love previs in live action because I know what lens I want to use, what kinds of rigs I need to build, but when I shoot it, its always different because you have to deal with reality and you have all these different compromises. But here I only had to worry about getting it done on time.
BD: So how would you define what Polar Express is? Is it animation? Is it a hybrid?
JC: I dont have a clear answer. We didnt know what it is when we were doing it because everything is so different about it. I dont know how to categorize it. I mean, I was asked to cut the visual effects Academy reel yesterday. My reaction was, Oh, really? This is visual effects? So I have to think about it. I guess I have to cut more of a storytelling piece. You cant say its an animated movie because from Bobs point of view, he directed actors, not a whole crew of animators.
BD: But if you look at the end result, its CG animation.
JC: True, its rendered in CGI.
BD: And theres lots of animation, including keyframed animals, and the eyes and mouths of the humans are keyframed and not part of the performance capture.
JC: Yes, thats another area of debate. Ken and I used all of the bag of tricks from visual effects that weve learned to create illusion. Whats interesting is all the different skill sets that are put into the movie break down the traditional barriers. But its not like this was a photorealistic attempt. Its not going to replace the way we do movies. Its creating a new genre: movies that are not keyframed à la Pixar or DreamWorks that have a different texture of movement that is more compelling in one sense.
BD: Where do you go from here on the next ImageMotion movie, Monster House?
JC: We have the next generation Vicon system. The volume and all the techniques are bigger, which is better. We are no longer confined to a 10 x10 area. I think Monster House is an intriguing example of this technology because the human characters are a little more caricatured, so artistically they have to find out how they move when their performances are driven by real people.
BD: What are some of the other improvements?
JC: Its technology; its how long you can record for.
BD: And how much keyframe embellishments are there?
JC: I dont know. But well always have to do tongues because there isnt enough volume on the lips when you purse them, and eyeballs, until we have a way of tracking them better. We have better ways of doing the eyeballs already because you cant put a marker on them. Contact lenses dont work because they make you look like aliens and they swim over your eyeballs anyway, and youre getting incomplete data, so you might as well have an animator work on the eyeballs using a video reference. Whats interesting is that the performance capture data had enough fidelity that you could actually see where the eyeball was looking just from the bulge in the eyelids. Its weird. And very complicated.
Bill Desowitz is editor of VFXWorld.