Search form

Stroke of Genius: NVIDIA Researchers Debut GauGAN at GTC 2019

Offering a powerful tool for creating virtual worlds, interactive ‘smart paintbrush’ app harnesses the power of generative adversarial networks to convert rough doodles into stunning photorealistic landscapes.

Kicking off this week’s GPU Technology Conference, underway now through Friday, March 18 in San Jose, CA, NVIDIA has announced a new interactive app called GauGAN -- in  a lighthearted nod to the post-Impressionist painter -- that employs deep learning models to convert rough doodles into photorealistic masterpieces with breathtaking ease. The tool leverages generative adversarial networks, or GANs, to convert segmentation maps into lifelike images.

GauGAN could offer a powerful tool for creating virtual worlds to everyone from architects and urban planners to landscape designers and game developers. With an AI that understands how the real world looks, these professionals could better prototype ideas and make rapid changes to a synthetic scene.

“It’s much easier to brainstorm designs with simple sketches, and this technology is able to convert sketches into highly realistic images,” said Bryan Catanzaro, vice president of applied deep learning research at NVIDIA.

Catanzaro likens the technology behind GauGAN to a “smart paintbrush” that can fill in the details inside rough segmentation maps, the high-level outlines that show the location of objects in a scene. GauGAN allows users to draw their own segmentation maps and manipulate the scene, labeling each segment with labels like sand, sky, sea or snow. Trained on a million images, the deep learning model then fills in the landscape with showstopping results: Draw in a pond, and nearby elements like trees and rocks will appear as reflections in the water. Swap a segment label from “grass” to “snow” and the entire image changes to a winter scene, with a formerly leafy tree turning barren.

“It’s like a coloring book picture that describes where a tree is, where the sun is, where the sky is,” Catanzaro said. “And then the neural network is able to fill in all of the detail and texture, and the reflections, shadows and colors, based on what it has learned about real images.”

Despite lacking an understanding of the physical world, GANs can produce convincing results because of their structure as a cooperating pair of networks: a generator and a discriminator. The generator creates images that it presents to the discriminator. Trained on real images, the discriminator coaches the generator with pixel-by-pixel feedback on how to improve the realism of its synthetic images. After training on real images, the discriminator knows that real ponds and lakes contain reflections -- so the generator learns to create a convincing imitation. The tool also allows users to add a style filter, changing a generated image to adapt the style of a particular painter, or change a daytime scene to sunset.

“This technology is not just stitching together pieces of other images, or cutting and pasting textures,” Catanzaro said. “It’s actually synthesizing new images, very similar to how an artist would draw something.”

While the GauGAN app focuses on nature elements like land, sea and sky, the underlying neural network is capable of filling in other landscape features, including buildings, roads and people. Here’s a look at the research paper behind GauGAN, which has been accepted as an oral presentation at the CVPR conference in June -- a recognition bestowed on just 5 percent of more than 5,000 submissions.

Source: NVIDIA

Jennifer Wolfe's picture

Formerly Editor-in-Chief of Animation World Network, Jennifer Wolfe has worked in the Media & Entertainment industry as a writer and PR professional since 2003.