Contour Reality Capture Crosses Uncanny Valley

If you have the QuickTime plug-in, you can view the clip below by simply clicking the image.

Take a look at what Contour can do. All images © Mova.

In a garage in Palo Alto, California, Steve Perlman crossed the Uncanny Valley with the help of a small team, some cameras and lights and a little childrens Halloween makeup. The result is the Contour Reality Capture System, a major evolution in motion capture and animation, sparked by technical innovations in What Dreams May Come and The Matrix trilogy, the latter offering the first use of markerless high-def facial capture for a widescreen feature.

A term coined by Japanese scientist Masahiro Mori, the Uncanny Valley describes an artificial character that is real enough to appear lifelike, yet still eerily robotic, causing an instinctive reaction of discomfort from the viewer. The imperceptible cues we learn from birth that portray a living soul are lacking. Recreating those cues has been the difficult task of motion capture, and even challenges the most skilled animator. Accurately recreating an existing living being without getting mired in that Uncanny Valley has been the Holy Grail.

Impressive Detail in Little Time

Perlman, who helped develop Apples QuickTime and founded WebTV Networks purchased by Microsoft, is now ceo of Rearden Companies where Contour was born. With some consultants and a dedicated team of four, he has developed Mova Contours highly sophisticated software that uses the simplest of materials. Using two separate sets of cameras and fluorescent lights that are synchronized to work in unison, and an actor wearing an application of phosphorescent liquid base makeup, a 3D model can be delivered with lifelike movement to a 10th of a millimeter in detail. With a resolution capability of more than 100,000 polygons per frame and up to 120 frames per second, realistic motion capture with reference textures can be delivered in a turnaround time of only 24 hours. There is no offset from the skin surface, making the character photoreal, and multiple performers and props can be captured at once while moving about freely within the camera view.

The detail available is impressive, so far only limited by in what resolution the data is captured. Perlman explains, Youll find demos on the mova.com website at 1920x662 HDTV resolution that are one quarter of the actual resolution, which is 1,300 pixels tall; 100,000 polys is a conservative measure. Our demos are done at a tenth of a millimeter. A close-up on the mouth is one example we use a lot. Were wondering where the resolution limitation is, but we havent found it yet.

Childs Play

The process is simple. Using a makeup sponge, the phosphorescent makeup is applied over the actors face, neck and clothes as required. The actor enters a space much like a holodeck on Star Trek: The Next Generation. Surrounded by an array of cameras and lights that are synchronized to flash faster than human perception, the shape of the face and details of movement down to the crinkling of the nose and the tamest of smiles, or the shifting of fabric as the body moves is captured in a high density 3D model that requires almost no cleanup. Even normally obscured areas such as under the chin are included when an adequate number of cameras are used.

The natural unevenness of the sponged-on phosphorescent makeup facilitates the process. Contour uses the random patterns to triangulate. Take, for example, two cameras. One camera finds a specific spot on a cheek. The other camera scans the entire face till it finds that spot. If the patterns were not random, it would not be able to uniquely identify that area. In marker-based MoCap, all the dots look the same, creating ambiguity, and there is a high cost for manual labor required for cleaning up the captured data. With Contour, there is no data clean up because the random pattern disambiguates the captured data. Contour can also track points on the captured surfaces through time, by looking for exact patterns from frame to frame.

replace_caption_dunlop02_Mova.jpg

Many Uses

The potential of Contour has only begun to become apparent. Actors can be aged or made to look younger, which is why David Fincher is reportedly using Contour on his next feature, the reverse aging fantasy, The Curious Case of Benjamin Button, starring Brad Pitt. Its less painful than plastic surgery, and less expensive! Perlman laughs, who coins his process Volumetric Cinematography. Actors can be captured in high-res for feature film and low-res for videogames at the same time. Perlman has been approached to scan childrens faces for family memories, and even a bike racer to figure out optimum angles.

Online video site WOA.TV (The Woman Of Action Network) will be using Contour to research how women land during basketball. We will be doing Contour Capture to show the difference between mens and womens method of landing to study the deformation of muscles. Women have a lower center of gravity, so they are going to do something differently than men.

Usually when you capture a surface, you want to see where points on a surface track from frame to frame. With a marker-based system, each attached marker is a vertex, or you might use a grease pencil to add spots to the face. Weeks later the animation team gets the information. Hopefully, its complete, but often its not, or they realize the tracked points are in the wrong locations. Contour developed Retrospective Vertex Tracking. With Contour, the animation team has a tool that allows them to specify where on the surface they want points tracked. Move that point, and it recalculates the point location. Its done in retrospect, after capture, so even if the data is used years later, the points can be moved around and the performance reused. The possibilities have lots of potential. Imagine Marlon Brando in Superman, returning as a complete character, including close-ups, in all the films to come.

Contour can capture hands, and what the hands are doing. Capturing a grip is very difficult in MoCap Perlman explains. If a performer grabs a mug, for example, the hard surface causes her hand to flex outward. When you see this motion, it acts as a subtle clue to your brain as to how stiff the object is she is grabbing. These are some of the cues that the Contour system successfully provides.

Synchronized to rapidly flashing lights, an array of cameras capture both the normally illuminated image of the performer, and the glow of phosphorescent makeup, and then use the random phosphorescent patterns to produce a completely photoreal 3D image of the performer with submillimeter precision. Photo credit Paul Trapani.

A Few Frontiers

Currently, there are still a few limitations. It is an optical system, so obscured areas wont be seen, and some parts of the hand at times will be missing. Contour partners have been working on solutions over the last year. Wet surfaces such as the eyes and mouth interior cant hold makeup. Perlman is currently experimenting with plastic teeth molds imbedded with phosphor to increase feedback. The teeth themselves dont move, but they do move relative to the bottom jaw, and the lips move over the teeth. Adding the molds gives a nice reference for the skull and jaw movement and the interaction with the lips. Loose strands of hair are difficult to capture, and for large volumes more cameras are needed.

Additional cameras give a better view of the subject, and are preferable for captures in situations such as costumes with folds of cloth. But a capture of half a face can be managed with as few as two cameras, limited to half because the nose gets in the way. The rig at the SIGGRAPH 2006 exhibition had 41 cameras and the demos on the website use 44. Currently, two cameras feed into each computer, using FireWire ports. Contour is moving to gigabit Ethernet cameras that will plug into a switch. Its expected to reduce the ratio, with more cameras per computer. Contour is extremely computationally intensive. Our first prototype took a week to reconstruct one frame, but after extensive software and hardware optimizations now one can be done in a few seconds.

The textures captured during the scan are currently only for reference. There is one camera in front and one on each side. There have been requests for maps such as albedo (straight image data with no lighting or shadows) and specular maps. Contour is looking into those possibilities now. States Perlman, Its great to work with people because its their requests that drive the system forward.

replace_caption_dunlop04_Mova.jpg

A New Landscape

Movas Contour has stirred up quite a bit of interest in the industry. Where realistic digital characters have traditionally been reserved for long distance action scenes, close up conversations are now feasible. Films currently use digital directing, where the shoot is completed in two to three months and all the rest is post, requiring a hefty budget, making these films very high risk. Using Contours method could streamline the process substantially. The industry is recognizing the potential. Softimages Face Robots 1.5 will include a special import option for Contour. Its compatible with Autodesks Maya, MotionBuilder and 3ds Max, as well as all Vicon software. And Contour is just the beginning of technologies being developed at Mova behind closed doors.

The companies that are considering Movas Contour are still under wraps. What Contour does for games is give them film-like reality, Perlman suggests. Other clients in the near future might include commercials that require a short turnaround time.

So far Mova is only selling the service, but may eventually begin to sell the hardware. Pricing will be offered in the third quarter of this year. The Contour system is less expensive than a marker based setup, since it doesnt require special cameras, and the computers are off the shelf. It does, however, have very sophisticated software, capturing voxels instead of pixels. Perlman adds, As we evolve the technology, a low-res Contour capture will be able to be viewed in realtime. We are working to get to a complete real-time volumetric cinematography.

This would give a director the same control as a conventional camera. The Uncanny Valley has finally been crossed, with the speed of phosphorescent light.

For more information on Movas Contour, see www.mova.com.

Renee Dunlop has worked in film, games and multimedia since 1993. She currently works at Sony Pictures in Culver City, California, and freelances as a Maya lighting digital artist and as a writer for several trade publications.