Continuing our excerpts from the Inspired 3D series, Keith Lango presents part two of a two-part tutorial on lip-sync and facial animation.
This is the next in a number of adaptations from the new Inspired series published by Premier Press. Comprised of four titles and edited by Kyle Clark and Michael Ford, these books are designed to provide animators and curious moviegoers with tips and tricks from Hollywood veterans.
The single best investment you can have when animating lip-sync is a mirror. Watch yourself talking naturallynot the goofy-faced, play-acting, over-exaggerated face antics you think you should do, but the natural flowing conversation of everyday speech. Watch how your mouth shifts over sounds, seemingly skipping sounds altogether. This simple act of watching and learning how a mouth moves during speech can yield great results. Its all about observation and the incorporation of what you see into what you animate. Watch video reference and news anchors doing their nightly reports. Observe how wide and varied the mouth and face can become during speech and how much is skimmed over in pronunciation.
The key element in good lip-sync animation is to grasp the essential elements of the communication as recorded in the sound track. You need to squint your ears and try to pick up the overall feel of the speech rather than a slavish interpretation of what you think you hear in the dialogue. I call this a kind of Impressionism, having been inspired by the Impressionist movement of the late 1800s.
For many years, up until the late 19th century, the effort in Renaissance art was the meticulous and accurate re- creation of reality in every fine detail. Realism was the goal, and literalism in interpreting a painting was the norm. Then, artists got an idea about capturing the overall essence of an image. They became less interested in capturing every leaf on a tree and began to focus on how the light, shadow and color hues projected that tree into another realm. In this new interpretive realm, leaves didnt matter as much as form, color, tone and contrast. Just as the Impressionist painters got away from a literal realism in capturing a picture, animators need to become impressionistic when it comes to lip- sync animation. The best place to start is with the broad strokes of your brush, to get blocked in the very foundation of your work. The best way to do this is to nail down your mouth open and close timings.
Open or Closed?
When you begin a lip-sync task, seek at first to do nothing more than hit the primary mouth opens and closes. By focusing on this basic need, you can get nearly all you need for lip-sync. Heck, the Muppets have gotten by on that for 30-plus years! These main target points are like the broad brushes in an Impressionist painting. They define shape, contrast, form and direction. The details of texture come later with the specific choices you make on top of the broad brushed open and closed pose shapes and timings.
The opens and closes are the foundation of your more specific choices. Even if all you do is properly hit the opens, closes, and wide shapes of the mouth at the right time, you are already more than 75% of the way to great lip-sync. You can get a lot out of very little lip-sync animation. If you have any doubts about this, take a look at animated films using projected texture map mouths, such as those in VeggieTales, which have proven that this is indeed true.
Literalism Versus Impressionism: A Case Study
In the film Mouse Hunt, Christopher Walkens character is mumbling about getting into the mouses head. In this rambling, he says (in a rather understated fashion) that you hafta get inside the mouses mind. You hafta get takes about 25 frames to say. At first look, it literally seems like there should be the following keys for the phrase: Y (a pucker shape) Ooo H Aa V T Uh G Eh T
That is a very literal interpretation of what it takes to show a person saying, you hafta get. However, if you keyframe the lip-sync in that manner, this will result in a very poppy or jerky mouth when animated. Some of those poses will be onscreen for only a single frame, which is entirely too much information and not enough time for the viewer to interpret it.
A quick analysis will show that you go from one mouth shape that is quite open (Ah in hafta) to a pretty closed one (the F in hafta) and then back open again (for the end of hafta). The result is the mouth popping from open to closed back to open in just three frames.
Oftentimes, beginners will make a phoneme that is an exact copy of ones face saying that single letter. So, to make E phonemes, you would say E by itself. To model K phonemes, you can base it off your own face in a mirror saying kuh. At first this seems logical. The problem is that when you say the t sound by itself (tuh), your face doesnt look at all like it would if you said skate. And that t in skate gives a face shape that is completely different from the t sound shapes in pet store. And THAT t is very different from the t shape you make when you say goatee. Figure 11 shows variations on the t shape.
[Figure 11] Working from left to right: the default t shape; the t shape in the word skate; the t shape in get; and the t shape in goat.
As such, its imperative to remember that mouth shapes for sounds must be animated in context. The preceding sound shape affects the current sound shape. Likewise, the following sound shape is anticipated in the current sound shape.
So, the shapes shown must all be in context with the shape/ sound that precedes it and follows it. If you get stuck on the idea of making all the t sounds in a sound-track the same shape, regardless of the prior or following sound/ shape context in the dialogue, then youre setting yourself up for a popping mouth when you go to view your animation. Animating speech is not animating letters; its animating the flow of shapes that are needed to make the sounds.
If you can get the major impressions across in your animation, you can let the little stuff slide. Just like the impressionist would hint at a cluster of leaves with a single daub of his brush, you too should let words and sound shapes slur into the next word or sound shape. Mix the target facial weights together to show a flow. Get away from showing leaves and start showing contrast and form. Talking is more of a flowing thought than an alliterative function of letters.
Looking again at the example phrase, you hafta get, a more impressionistic interpretation would be to emphasize the following major accents:
Go ahead and say that out loud. Ooo as in scoop, aaFF as in after, and Eh as in pet. Ooo- aaFF- Eh.
It sounds a lot like you hafta get, doesnt it?
Now go one further. Grab a handheld mirror and say you hafta get. Watch how your mouth looks as you say it again. Now, say ooo- aaFF- eh a few times. See how very close the two are in how they look? Here is another example of this same principle: Say to your mirror, I love you. Then say to it, elephant shoes. The two look similar, dont they? Heres a breakdown of a few specific choices.
Youll want to start by letting the yuh of you flow into the more open aa at the beginning of hafta. Skip the specific ooo at the end of you, because it is not very strong. Its there, but it gets said while the mouth is transitioning into the beginning of hafta. Basically, it slurs into the next word. The h of hafta is buried in the back of the throat, so the lips dont really need to show it.
Picking up from the moderately strong aa of hafta, hit the f for two frames to let it read. Its the major closed point of the phrase, so that needs to line up and read clearly. Then skip the ending ah of hafta altogether, as well as the g of get. Both happen under the breath; theyre slurred under the transition from ff to the eh accent of get. Hit that last open pose of eh. Then end with an appropriately shaped nearly closed mouth to catch the idea of a t. Youve basically now animated Ooo-aaFF-Eht. And you know what? It flows, it feels natural, and it doesnt pop.
What about t s, d s, n s and such? Well, if your character has a tongue, you can get all the inner mouth sound shapes you need with that. The inner mouth sound shapes are as follows: L
Add your tongue work in here, keeping it as impressionistic as everything else, and you can handle the little stuff quite easily. A good tip is to keep tongue movements very quick. Dont have the tongue take longer than two frames to get from one position to another (unless you have a specific reason) or it will look like your character is saying the LL soundthe word bad turns into bald and good becomes gold. Keep the tongue light and quick.
Miscellaneous Lip- Sync Tips to Keep in Mind
The amount of data available on lip- sync and animation is quite impressive. Animation is nearly 100 years old and much has been discovered and found to be generally reliable. As an additional aide to a good lip-sync technique, here are a few miscellaneous tips that have been discovered over the years:
- Dont go from wide open to closed in one frame and vice versa. Definitely dont go from open to closed to open in three frames.
- Dont hold a mouth shape static. An Ah shape should shift into a slightly different Ah as its being held.
- Keep M s and F s for two frames. If space and timing are tight, steal from the previous sound.
- Keep an eye on your targets and make sure theyre not too linear going from one sound shape to the next. Facial animation requires all the animation techniques that are applied to the body (breakdowns, arcs, overlap).
- Hit the sound shape at least two frames before the sound is heard. Even if youre right on the nose, it will feel late when played at full speed. Humans process sight faster than sound, so the audience will pick up the cues from the shape before the sound.
- Break up the mouth angles. Shift the mouth up and down, tilt it left or right, and get snarls in there. Show emotion as the character speaks. The character can speak and smile, speak and frown, and speak and yawn at the same time.
- Build rigs that allow you to keep that kind of life in your lip- sync animation.
- Upper teeth do not move; theyre nailed to your skull.
- Jaws rotate, not slide, in characters with clearly defined head/ neck areas.
- Push your poses. Dont be afraid to go extreme. Avoid the Princess Fiona Final Fantasy Syndrome (translation: dont try to replicate photorealism). Keep the energy of the sound track in mind when youre doing the mouth shapes. Louder sounds with more energy should be shown with the mouth open wider and sound shapes more extreme. Watch TV announcers talk; their faces are very elastic and extreme at times.
Youve heard the quote, the eyes are the window to the soul. After the body, the eyes are key in emotional communication. If you have great body animation and great eye-emotion animation, you dont need a mouth to get the point across. So, remember that the eyes must follow the same pattern for communication as the rest of your character.
[Figure 12] The characters face begins with a happy expression. The shift begins in his eyes, then moves to his lower face. The final expression of total anger is then revealed.
The eyes lead the rest of the face. I refer to this as cascading revelation. The revelation (revealing) of the change of inner emotions starts with the eyes and then cascades down the face into the mouth, then into the shoulders and spine. Its like the emotion in a character is flowing down the body, all starting with the eyes; they lead any emotional shift. Figure 12 shows this transformation of emotion.
The first place to get a sense of a character switching from happy to angry should be in the eyes. The rest of the face will come second, and the body will follow. From a motivational standpoint, a character must feel something before he can act something. So, you need to know what the character is feeling, then show that in animation. If a character is going to shift his feelings, you must show that inner shift in emotion in the eyes first. After that, you can move that emotional shift to the rest of the body.
A persons eyes cannot hide the inner realities of the heart. So while a person may put up a brave front, the eyes give away the keys to the soul. The lever to subtext in animation (subtext being the unsaid truth of the moment) is to show this truth in the eyes.
If your characters eyes are missing the mark emotionally, then your character is lying. You can show a brave body, but if the eyes show fear, then fear is what is true, no matter what the bluster of the body is saying with its pose. If thats what you want to portray, thats great. The worst possible thing is to have your characters eyes be off the mark by mistake. Thats akin to using words out of context. (Hey, dont jump to contusions!) For this reason, it is imperative that you study and learn the values of expression in the eyes. There are several excellent resources for facial posing for emotional impactone of the most popular being Gary Faigins book The Artists Complete Guide to Facial Expression. However, the best resource is studying real life. Watch people, watch good films (not junk films), and study great acting performances to instill new and varied meanings into your animation vocabulary. By expanding your vocabulary, youll be able to broaden your ability to speak to the viewer through the eyes of your character.
Blinks are one of those odd areas in facial animation. There are good rules to follow, but theyre never lock solid never shall we violate these rules under penalty of death and public shame kinds of rules. Theyre merely little things to keep in mind when youre animating blinks. A few of these are
- blink on head turns
- blink on eye shifts
- blink once every 30 frames or so
- standard blinks are two frames to close, one closed and five frames to open (on 24 fps)
As I said, these are generalities that may or may not work. The key to blinks, like everything else, is to think about them in context. A blink can be merely a mechanical process for moistening the eyes, in which case theyre unconscious and involuntary. This kind of blink shouldnt draw attention to itself; it is merely there to keep the character alive. But a blink can also be intentional and motivated. Oftentimes, the inner emotions and thoughts of a character will drive a blink or two that isnt strictly meeting a physiological eyeball-hydration need. A good example of this is a person who has just received shocking news. They may cast down their gaze in response and also blink quickly several times. This is a purely emotional, thought driven reaction. Blinks help humans make the jump from one thing to the next in the mind. Similar to a cut from shot to shot in a film, a blink is a cut from one thought or topic to another. It makes sense to blink when a character turns his head or shifts his eyes. He is clearing the slate in his mind. Its realtime editing in your mind. If a character is struggling to focus, or is trying to process difficult information (Your father has just died), he will tend to blink more often in an attempt to try to clear away the confusion and find clarity in his mind. This gives voice to the rule that a person will double-blink when lying. The person knows he is lying, and he has to work harder in his mind to gather the lie. Hell blink for clarity.
I usually create blinks last when animating a face, unless the blink is the primary action in the shot. By then your characters body, mouth, eyes, and face should be communicating very clearly what needs to be said. You dont want the audience to notice blinks, unless theyre supposed to for story reasons. Blinks in face animation are a lot like seasoning. You add them to round out the flavor of the acting, not as a foundation for it. But again, this is a generality. There will always be the occasion where the blink is the primary action of the shot or has a heightened level of significance.
Heres an example of how I generally approach facial animation. This is a check-list I use to complete a facial animation assignment.
Listen to the sound track.
Write down the feelings of the character at certain moments. (Know what you want to say.)
Watch yourself saying it.
Look at body poses.
Sketch face poses that work for that feeling. (Know what words to use to say it.)
Plan eye direction to match emotion.
Keyframe eye direction.
Get open/close timing of jaw to match voice track.
Get other detailed lip-sync.
Pick key moments of emotion and key- frame the face as a whole to match the emotion (whole face keyframing).
Modify lip-sync with emotional modifiers (happy, sad, angry, nervous).
Offset the animation, revealing inner realities with the eyes first, then cascading the emotion down into the lower face and then the body.
Do the blinks.
By planning your work and working your plan, you can bring more thought and preparation to your animation. Good planning and good execution go hand in hand in helping to achieve successful facial animation. Always remember: Failing to plan is planning to fail.
To learn more about posing and staging, character animation, walks, tools of the trade and other topics of interest to animators, check out Inspired 3D Character Animation by Kyle Clark; series edited by Kyle Clark and Michael Ford: Premier Press, 2002. 268 pages with illustrations. ISBN 1-931841-48-9 ($59.99) Read more about all four titles in the Inspired series and check back to VFXWorld frequently to read new excerpts.
Keith Lango is the computer graphics supervisor of the feature film at Big Idea Prods. Inc. in Chicago, makers of the top-selling childrens video property VeggieTales and 3-2-1 Penguins! Keith got his start in CG in the early 90s and has held positions as an illustrator, a senior animator, an animation supervisor, an assistant director, a CG supervisor and a writer. Keith has also co-authored and co-illustrated a childrens book as well as personally developed several award-winning short animated films. He lives happily with Kim (his wife of 14 years) and his three children: Candice, Laura and John Mark.
Author and series editor Kyle Clark (left) and series editor Mike Ford (right).
Series editor Kyle Clark is a lead animator at Microsofts Digital Anvil Studios and co-founder of Animation Foundation. He majored in film, video and computer animation at USC and has since worked on a number of feature, commercial and game projects. He has also taught at various schools, including San Francisco Academy of Art College, San Francisco State University, UCLA School of Design and Texas A&M University.
Series editor and author Michael Ford is a senior technical animator at Sony Pictures Imageworks and co-founder of Animation Foundation. A graduate of UCLAs School of Design, he has since worked on numerous feature and commercial projects at ILM, Centropolis FX and Digital Magic. He has lectured at the UCLA School of Design, USC, DeAnza College and San Francisco Academy of Art College.